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ABSTRACT 

In  this  report,  a  study  is  made  of  information  theoretic  channels  which  are  decomposable 
into  a  number  of  parallel  subchannels  which  will,  in  general,  be  dependent.  P'or  this 
situation,  two  models  are  constructed  in  which  each  subchannel  input  affects  only  the 
corresponding  subchannel  output  (no  crosstalk).  In  the  first  model  (MC  channel),  the 
lack  of  crosstalk  is  ensured  by  constraints  on  the  channel  conditional  probability  distri¬ 
bution.  The  second  model  (MS  channel)  is  a  channel  with  an  underlying  state  structure 
with  states  independent  of  the  input.  Both  models  are  memoryless.  All  MS  channels  are 
MC,  but  the  reverse  does  not  hold. 

The  effect  of  subchannel  dependencies  on  capacity  and  random  coding  exponent  (RCE)  is 
investigated.  It  is  proved  that  these  dependencies  cannot  decrease  the  capacity  of  our 
channels.  However,  subchannel  dependencies  may  either  increase  or  degrease  the  RCIL 
It  is  also  proved  that  the  capacity  of  the  channel  is  not  less  than  the  sum  of  the  capacities 
of  the  individual  subchannels.  When  the  state  model  is  used,  the  above  two  quantities 
are  equal  if  the  receiver  has  knowledge  of  the  channel  state. 

A  definition  of  partial  state  knowledge  is  given.  It  is  proved  that,  when  the  receiver  has 
partial  state  knowledge,  the  resulting  capacity  and  RCE  are  not  decreased.  For  com¬ 
plete  state  knowledge  at  the  receiver,  the  capacity  and  RCE  are  not  less  than  those  ob¬ 
tained  for  partial  state  knowledge. 

A  restricted  class  of  MS  channels  is  defined  wherein  all  the  subchannels  are  in  the  same 
state  during  each  use  of  the  channel;  these  channels  are  called  MSCC  channels.  For 
these  channels,  a  number  of  results  are  given,  most  of  which  concern  the  limiting  be¬ 
havior  of  the  capacity  per  subchanneland  the  RCE  as  the  number  of  subchannels  becomes 
large.  The  principal  results  are:  (1)  the  capacity  per  subchannel  has  a  finite  limit;  and 
(2)  the  RCE  has  a  finite  limit  if  the  rate  per  subchannel  is  kept  constant  and  the  constant 
is  sufficiently  large.  These  results  hold  whether  or  not  the  state  is  kno\vn  at  the 
receiver. 

Systematic  coding  and  decoding,  using  BCH  codes  and  minimum  distance  decoding  rules, 
are  considered  for  MSCC  channels.  Various  coding  alternatives  are  discussed,  and  for¬ 
mulas  are  given  for  computing  or  bounding  performance. 
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PARALLEL  CHANNELS  WITHOUT  CROSSTALK 


CHAPTER  1 
INTRODUCTION 


In  a  topical  point-to-point  discrete  communication  situation  (see  Fig.  I),  we  have  as  input  to 
a  transmitter  a  random  message  m  which  may  take  on  one  of  K  values.  Corresponding  to  each 
message  value  m^,  there  is  a  distinct  waveform  s.(t)  which  is  transmitted  in  response  to  the 
message  input.  The  transmitted  waveform  s(t)  is  corrupted  by  the  waveform  channel  (fading, 
additive  noise,  attenuation,  etc.),  and  a  resultant  signal  r(t)  is  the  input  to  the  receiver.  The 


DISCRETE  CHANNEL 


I  3-63-72^0  I 


Fig.  1.  Discrete  communication  system  model. 


receiver  then  must  decide  which  message  was  the  input  to  the  transmittcu*;  its  decision  is  demoted 
in  Fig.  1  as  m.  Discrete  information  theory  generally  deals  with  situations  where  the  modulation 
(transmitter),  waveform  channel,  and  receiver  are  considered  as  fixed,  and  the  problems  ad¬ 
dressed  concern  the  properties  and  proper  utilization  of  the  resulting  combination.  This  com¬ 
bination  is  called  the  discrete  channel.  For  the  purpose  of  prop  miy  utilizing  th('  discrete  channel, 
we  shall  be  willing  to  add  both  pre-transmission  and  post-reception  processing  devices.  Thes(‘ 
are  usually  called  coders  and  decoders,  respectively  (see  F'ig.  2).  Sometimes,  the  receiver  will 
have  knowledge  of  the  condition  (state)  of  the  channel,  in  which  ease  it  is  assumed  that  this  in¬ 
formation  is  passed  on  to  thc'  decoder. 


~|3-63-t?hh)| 

CHANNEL 


Fig.  2.  Discrete  channel  with  coder  and  decoder. 

Often,  in  practice,  the  transmitter,  receiver,  and  wav'eform  channel  are  such  that  the  single 
discrete  channel  may  b<'  profitably  viewed  as  an  aggregate  of  parallel  subchannels.  This  situa¬ 
tion  is  usually  associated  with  modulation  schemes  where  each  subchannel  corresponds  to  a  trans¬ 
mitter  frequency  interval  which  does  not  significantly  overlap  any  of  tht‘  others.  \Vc  will,  in 
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general,  assume  tiiat  our  parallel  subchannels  are  dependent  but  witliout  (U'osstalk,  Tlie  absence 
of  crosstalk  implies  tliat  eac'h  subchannel  input  affects  only  the  c’orresponding  subchannel  output. 
Such  a  set  of  subcdiannels  may,  however,  be  de[)endent  (i.e.,  the  subcliannel  input-outjnit  paii'S 
ar-e  dependent  in  the  usual  statistical  sense)  if  the  natural  disturbanc'e  (e.g.,  fading)  affecting 
them  is  itself  not  independent  from  subchannel  to  subchannel. 

Some  examples  of  the  dependent  parallel  channel  situation  we  have  in  mind  are  scattei'  chan¬ 
nels  (e,g,,  tropospheric  and  ionospheric  sc'atter*),  cdiannels  with  additive  colored  (iaussian  noise 
of  unknown  specnrum,  and  channels  subjec’t  to  jamming. 

.Multiple  subchannels  taken  together  arc  usually  a  less  general  type  of  single  (diannel  than 
that  which  the  {physical  constraints  on  the  ('ommunication  problem  alone  would  suggest  we  con¬ 
sider.  llowcn  er,  the  study  of  parallel  channels  is  important  for  three  principal  reasons.  In 
the  first  place,  many  existing  communication  systems  ar  e  built  in  a  multiple-channel  formi. 

These  include  llh'  systems,  tropospheruc  scatter*  systems,  satellite  systems,  and  telephone 
company  equipment  of  vai'ious  types.  In  such  situations,  the  multi}:)le -channel  structui'e  is 
forced  upon  the  user.  A  sec'ond  situation  is  one  which  is  thought  to  obtain  in  opti('al  communi¬ 
cation  systems.  Here,  the  bandwidths  ai’c  so  gi'eat  that  no  method  is  presently  available  or 
immediately  foreseeable  which  would  allow  one  to  modulate  across  the  entire  channel  bandwidth 
at  once,  A  division  of  the  channel  bandwidth  into  subchannels  is  a  technologic'al  necessity,  l-'i- 
nally,  given  certain  physical  constraints  on  a  communication  })roblem,  a  multiple-channel  com¬ 
munication  system  may  always  be  a  candidate  lor  consideration  as  a  solution.  It  may,  in  fact, 
be  the  most  general  type  of  realization  that  one  is  able  to  analyze,  but  this  will  depend  on  the 
behavior  of  the  physical  channel. 

The  communication  systems  we  ha\e  been  considbring  are  point-to-point  systems,  where 
all  information  originates  at  a  single  point  and  is  to  he  ultimately  received  at  a  single  })oint  })hys- 
ically  removed  from  the  first.  In  what  follows,  we  shall  take  the  discrete  infor-mation  theoi  cftic 
point  of  view  and  always  assume  the  discrete  channel  to  be  given. 
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CHAPTER  2 

MODELS,  DUALITY,  AND  SOME  BASIC  THEOREMS 


A.  PARALLEL  CHANNEL  MODELS 

To  model  a  time  discrete  channel  in  the  information  theoretic  sense,  we  need  to  define 

several  elements.  First,  corresponding  to  the  basic  unit  of  signal  duration  implied  by  the  time 

discreteness,  we  define  an  input  space  X  and  an  output  space  Y.t  We  refer  to  a  member  x  of 

X  as  an  input,  and  to  a  member  y  of  Y  as  an  output.  Let  X^  denote  the  space  of  s('quenccs  of 

N 

inputs  of  length  N,  and  Y  denote  the  space  of  sequences  of  outputs  of  length  N.  We  denote  a 

N  N  N  N 

member  of  X  by  x  =  (x^,  .  .  . ,  and  a  member  of  Y^  by  y  ^  =  (y^,  .  .  .  ,  y^)’  Then,  we  define 

a  set  of  conditional  probability  distributions  or  densities?  pj^T(y^/x^  ),  N  1,2,...,  on  sequeiK'es 

of  inputs  and  outputs  of  arbitrary  length.  Sometimes,  a  channel  state  variable  is  introduced  into 

the  description  to  account  for  the  memory  of  the  channel,  if  any.  If  the  channel  is  memoryless, 

N 

Pf^(y^/x^)  =  n  N  =  i,2, ... 

i=l 

and  X,  Y,  and  p^(y/x)  suffice  to  specify  the  channel. 

The  parallel  channel  models  we  shall  dis¬ 
cuss  assume  that  each  input  x.  is  decomposable 

into  M  subelements  x, .,  .  .  .  ,  x^,.  which  we  shall 
li  Ml 

call  subchannel  inputs,  and  each  output  y^  is  de¬ 
composable  into  M  subclcments  y^^  .  .  . 
which  wc  shall  call  subchannel  outputs.  Wc 
shall  assume  that  the  space  of  the  subchan¬ 
nel  input  (output)  at  "time"  i  is  independent  of 
both  k  and  i  and  denote  it  by  X  (Y  ).  Xow,  in 
general,  we  can  simply  substitute  .  .  .  , 

for  x^,  and  (y^p  •  •  •  probabil¬ 

ity  distributions  which  describe  the  channel  and  we  shall  have  a  description  in  terms  of  subchan¬ 
nel  inputs  and  outputs.  It  will  be  convenient  to  make  a  simplifying  assumption  which  will  be  in 
effect  throughout  most  of  this  report:  the  channel  will  be  assumed  memoryless,  iience,  we  shall 

be  interested  in  a  channel  description  given  by  subchannel  input  and  output  spaces  X  and  Y  and 

s  s 

probability  distributions  of  the  form  p(y^,  .  •  .  •  •  •  »  ^i  ^  ^s*  ^i  ^  ^s 

We  have  not  yet  finished  imposing  structure  on  our  channel.  Further  structure  is  desirable 
in  order  to  model  the  physical  channels  we  have  mentioned  in  Chapter  1  and  to  restrict 
the  situations  wc  wish  to  consider  in  order  to  obtain  meaningful  results.  Moreovc'r,  we 
shall  provide  two  structural  descriptions  (models)  of  quite  different  sorts  and  shall  have 


Fig.  3.  Conceptual  diagram  of  parallel  channels. 


t  In  Chapter  1,  we  (tacitly)  assumed  Y  =  X.  Here,  we  consider  a  more  general  situation. 

I  To  avoid  the  tedium  of  repeating  the  words  “or  densities"  when  the  random  variables  referred  to  may  be 
either  continuous  or  discrete,  this  may  be  assumed  unless  otherwise  stated. 

§lf,  contrary  to  what  is  implied  but  not  required  by  the  notation,  p(y]  / . . .  , . .  .  ,x^)  depends  on  fewer 

than  M  subchannel  inputs,  we  have  a  highly  degenerate  situation.  We  do  not  wish  to  consider  such  situations. 
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something  to  say  about  their  relation  to  each  other.  The  first  model  will  be  in  the  form  of  a  set 
of  constraints  on  the  probability  distribution  p(y^>  .  .  .  ,  ’  •  •  »  second  will  utilize  a 

channel  state  structure  to  describe  p(y^,  .  .  .  ,  •  •  •  > 


1.  Model  1  -  The  MC  Channel  (M  Subchannel,  Crosstalkless  Channel) 


Suppose  a  set  of  M  —  k  subchannel  inputs  and  their  corresponding  outputs  are  not  to  be  used 

in  communicating.  These  ]\1  —  k  inputs  are  set  to  fixed  values.  For  purposes  of  communication, 

we  are  interested  in  the  conditional  probability  which  relates  the  k  inputs  which  are  used  to  their 

corresponding  outputs.  Denote  the  unused  in{)Uts  and  outputs  l)v  x.  ,  v.  1,  .  .  .  ,  IM  —  k),  and 

•If  "jf 


the  used  inputs  and  outputs  by  x.  ,  y.  (i  =  1 . k).  (Note  that  {j,i  y  and  {i.} 

t  t-i  t 

sets  and  that  their  union  is  the  set  of  integers  from  1  to  AI.)  What  we  wisli  is  p 


Now, 


k 


are  disjoint 


^i 


1 


y- 


p(y. 


A1 


)  .  (^-1) 


y- 

•^Al-k  ^ 


As  the  notation  implies,  the  I  JLS  of  lOq.  (Z-1)  is,  in  general,  dependent  upon  all  A1  subcliannel 

inputs  x^,  .  .  . ,  ilowever,  if  for  a  particular  p(y^,  .  .  .  >  *  *  •  »  ^  ^  ^ 

IJIS  of  Eq.  (2-1)  does  not  depend  on  x.  ,  .  .  .  ,  x.  ,  then  the  values  to  which  these  latter  are  set 

J]\l-k 

do  not  affect  the  used  inputs  and  out))uts  in  the  least.  If  this  is  the  case, 


’K . . "m)  . \A, . \) 


(2-2) 


We  then  say  that  there  is  no  crosstalk  between  the  used  and  unused  subc  hannels .  If  this  is  the 
case  for  all  k  1,  .  .  .  ,  M  —  1  and  all  { i^)^  then  we  refer  to  our  channel  as  an  AlC’  channel.  This 
name  is  chosen  for  brevity  rather  than  cx))licit  descriptiveness,  because  we  will  need  to  repeat 
it  often.  The  condition  we  have  derived  can  be  stated  very  simply:  A  parallel  channel  with  A1 
subchannels  is  an  AlC  channel  if  a  summation^  of  p(y^,  .  .  .  ,  y'^/\/ ^  over  all  values  of 
the  members  of  any  subset  of  {y.).^\  destroys  the  dependence  on  the  corresponding  subset  of 

{x.Fl.t 

^  iT  1 

Here,  a  terminological  note  is  appropriate.  We  have  assumed  that  the  channel  input  is  de¬ 
composable  into  the  same  number  of  elements  as  the  channel  output.  Hence,  we  may  refer  to 
a  pair  consisting  of  an  input  subelement  and  its  corresponding  output  subelement  as  a  subchannel. 
The  correspondence  wo  speak  of  is  only  clear  if  the  channel  is  an  AlC  channel.  In  fact,  we  may 


t  An  integrotion  is  required  if  p(y^ , . .  .  '  ’  ’  ’  °  density.  We  sholl  not  bother  to  stote  this  explicitly 

ogoin. 

t  In  the  discussion  obove,  the  volues  of  x.  ,.  .  ,,x.  need  not  be  considered  fixed.  We  could  hove  ossumed 

h  'M-k 

at  the  beginning  of  the  description  of  the  MC  channel  that  k  subchonnels  were  used  by  user  A  ond  the  remoining 
M  —  k  by  user  B.  If  it  is  desired  thot  user  B's  input  not  affect  user  A's  output,  we  get  our  MC  channel  model. 
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say  that  a  channel  is  an  MC  channel  if  and  only  if  there  is  some  way  to  pair  the  input  and 
output  subelements  so  that  Eq.  (2-2)  is  true  for  all  k,  1  <  k  —  1,  and  all  1  <  i^  <  M, 

1  i  <  k.  The  pairing  need  not  be  unique. 

We  summarize  with  the  following  definition:  A  memoryless  channel  consisting  of  M  sub¬ 
channels  each  with  input  space  and  output  space  and  characterized  by  the  conditional  prob¬ 
ability  piy^,  .  •  •  ,  ♦  •  •  »  is  an  MC  channel  if  for  each  k,  1  <  k  <  IVI  ->  1,  and  for  each 

1  <  ii  <  i  4t4k, 


Although  the  definition  of  MC  channel  we  have  used  assures  us  that  disjoint  sets  of  subchannels 
are  mutually  noninterfering  however  they  are  composed,  the  verification  of  the  MC  property  is 
rather  tedious  if  the  number  of  subchannels  is  large.  In  fact,  the  number  z  of  different  equations 
of  the  form  of  Eq.  (2-2)  which  must  be  satisfied  is  given  by 

M-1 

^  =  Z  (k)  2^^  -  2  .  (2-3) 

k-1 


Fortunately,  this  number  can  be  reduced  to  M  by  making  use  of  the  following  theorem. 
Theorem  2.1. 

A  memoryless  channel  consisting  of  M  subchannels  each  with  input  space  and  output 
space  Yg  and  characterized  by  the  conditional  probability  p(y^,  .  •  •  ,  •  •  •  »  is  an  MC 

channel  if  and  only  if  for  each  i,  1  <  i  M. 

p<yi . 

=  p<yi . yL-i'^i+i . . ""m*  •  <2- 

Proof. 


(1)  Necessity  is  proved  simply  by  observing  that  Eq.  (2-4)  is  equivalent  to  Eq.  (2-2)  for 
k  =  M  —  1  and  i i,  distinct. 

^  ^  r  'i  k 

(2)  To  prove  sufficiency,  pick  k,  1  <  k  M.  Let  be  a  set  of  k  distinct  integers 

each  satisfying  1  i^  M.  Let  ( consist  of  the  remaining  (M  —  k)  integers  satisfying 
l<j^^M.  Kecall 

. "m) 


V 

Lj 

y.  eY_ 


V 


p(yi . . 


y.  cY 

JM-k  ® 


[Eq.  (2-1)] 


If  we  assume  the  theorem  is  false,  then  for  some  integer  q,  1  <  q  <  M  —  k,  the  LIIS  of  Eq.  (2-1) 

depends  on  x.  .  Now  the  sums  on  the  RHS  of  Eq.  (2-1)  may  be  formed  in  any  order. ^  Hence, 

Jq 

fThis  is  true  even  if  is  not  finite;  see,  for  exomple,  W.  Rudin,  Principles  of  Mothemoticol  Analysis,  2nd 
edition  (McGraw-Hill,  New  York,  1964),  Theorem  8.  3.  If  p(y]  / •  ♦  • / •  •  • is  o  conditional  density 
ond  integrals  reploce  sums,  then  the  Fubini  Theorem  ollows  us  to  integrote  in  ony  order;  see,  for  example, 

H.  L.  Royden,  Real  Anolysis  (Macmillan,  New  York,  1963),  p.  233. 
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“(■'1, . . "m) 


. , 

y 

N'  .  . 

V 

V 

_ 1 

y . 

^  h 

^’i 

Jq-l 

y; 

Jq4l 

^M-k 

Jq 

p(yi.  •  •  ■ .  •  •  •  < 


\’  .  .  .  \' 


>■;  y. 

‘Q-l  ^  1 


Jl\l-k 


(2-5) 


p(^\ . 

‘^q-1  ‘^q^l  '  ‘'q-1  ‘'q+1 


(2-6) 


Since  each  summand  on  tlie  KlIS  of  ICq.  (2-6)  is  independent  of  x.  ,  tlie  sum,  and  hence  the  I, US 

•’q 

of  lOq.  (2-6),  is  in(lepend(*nt  of  x.  .  ‘fbus,  \vc  have  a  proof  by  contradiction. 

■'q 


\Vc  note  that  in  the  Al(‘  channel  model,  altlioiigh  we  have  defined  .subchannel  inputs  and  out¬ 
puts,  the  subchannels  themselves  are  not  identifiable.  Our  second  model  will  have  identifiable 
subchannels . 


2.  Model  2  -  The  MS  Channel  (M  Subchannel,  State  Description  Channel) 

Suppose  we  have  a  set  of  M  subchannels  each  of  which  may  be  in  one  of  a  number  of  states. 
We  call  the  set  of  subchannel  states  A.  Associated  with  cacli  a  c  A,  tliere  is  a  subchannel  con¬ 
ditional  probability  distribution^  p  / fl  ),  4  ^  V  rj  (  X  We  let  n  .  denote  the  state  of  the 
th  *”  s  s  1 

i  subchannel,  and  rv  -  (fv^,  .  .  .,  denote  the  state  of  the  (wliole)  channel  consisting  of  M 
subchannels.  We  call  a  the  channel  state  vector.  We  assume  that  a  probability  distribution^ 
p(o'^,  .  .  . ,  on  the  subchannel  states  is  given.  This  is  equivalent  to  a  distribution  p(n)  on  the 

(whole)  channel  state.  Let^ 

p(yi - - 


cv  .  e  A 
1 


If 


I  P(ai,-- 

•'"m*  Pa  ■  Pc 

1 

■  .  and  p_^(y/ x)  for 

(2-7) 


A1 


then  Eq.  (2-7)  can  be  written  in  the  more  condensed  form 

piy/-'^)  =  L  p(«)p_(y/x) 

-  .  M  " 

a  €A 


(2-8) 


t  These  may  be  densities. 
t  This  may  be  a  density. 

§  If  p(a^, .  . .  /O^)  is  a  density,  the  sums  over  a^ , . , .  ,a^  become  Integrals. 
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A  memoryless  channel  consisting  of  M  subchannels  each  with  input  space  and  output  space 

Yg  and  characterized  by  the  conditional  probability  p(y|,  ♦  ♦  ♦ ,  ‘  defined  by  Eq.  (2-7) 

is  called  an  MS  channel  (see  Fig.  4). 


Fig.  4.  The  MS  chonnel. 


|3-63-72l3l 


-o  y. 


L 


p(y, . ^ -  ^  A  Po  h, /x,)  ••  P„  (y^,/ K^) 

1  Ml  M  a  cA  Q  cA  ^  Mall  a  M  M 


Suppose  that  we  chose  instead  the  apparently  more  general  model  where  the  sets  A^, 

M 

i  =  1,  .  .  .  ,  M  of  subchannel  states  were  allowed  to  be  different.  In  fact,  letting  A  -  (j  A.,  we 

i  1  ^ 

can  represent  this  situation  as  an  MS  channel.  Hence,  these  two  models  are  equivalent  and  we 
have  chosen  the  one  which  is  notationally  a  trifle  more  simple. 

Some  relations  between  the  MF  and  MS  channels  will  now  be  given.  Referring  to  Eq.  (2-7), 
one  may  see  that  a  summation  of  any  y.  ov’er  Y  destroys  the  dependence  of  the  summed  expres- 

II  s  1 

because  X  p  (y./x.)  -  1,  for  all  a.  e  A,  and  x.  e  X  .  Hence,  by  Theorem  2.1, 
^01.  1  1  ’  1  sj  ’ 

y.eA  1 

•^1  s 

w'e  see  immediately  that  every  MS  channel  is  an  MC,  channel. 

Although  the  fact  is  somewhat  surprising,  it  is  not  true  that  every  MC’  channel  is  an  MS 
channel.  A  counterexample  is  discussed  in  Appendix  A. 

The  emphasis  in  this  w^ork  will  be  on  AIS  rather  than  on  AK’  channels,  which  latter  are  of 
doubtful  engineering  interest  when  they  cannot  be  modeled  as  AIS  channels.  We  shall,  however, 
assume  the  more  general  MC’  channel  model  when  a  result  follows  naturally  from  this  assumption. 

B.  DUALITY  BETWEEN  TIME  AND  PARALLEL  DIRECTIONS 

The  mathematical  descriptions  of  a  memoryless  parallel  channel  bear  a  strong  resemblance 
to  those  of  a  single  channel  with  memory.  In  both  cases,  we  start  with  base  spaces  and  Y^ 
for  the  basic  indecomposable  inputs  and  outputs.  An  input  to  or  output  of  an  AIS  or  AK'  channel 

A| 

is  a  member  of  the  product  space  X  or  Y  ^  .  Similarly,  if  we  have  but  a  single  channel,  a 

^  NX 

sequence  of  its  inputs  or  outputs  of  length  N  is  a  member  of  X  or  Y"  .  The  MS  and  MC’  chan- 

^  ^  A1  A1 

nels  are  characterized  by  a  conditional  probability  distribution  defined  over  Y  x  X  .  Insofar 

.  s  s 

as  one  is  interested  only  in  transmitting  a  sequence  of  inputs  of  length  N,'  a  single  channel  with 

fN  may  be  the  black  length  of  o  block  code,  or  large  enough  so  thot  N  times  the  basic  unit  of  signol  durotian 
is  equal  to  the  lifetime  of  the  equipment. 
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memory  is  sufficiently  characterized  by  a  conditional  probability  distribution  defined  over 
N  N 

Yg  X  Xg  .  This  is  not  to  say  that  there  may  not  exist  simpler  characterizations  in  particular 
cases. 

The  purpose  of  mentioning  the  duality  between  time  and  parallel  directions  is  twofold:  first, 
it  enables  us  to  borrow  results  obtained  for  single  channels,  possibly  with  memory,  to  use  for 
our  parallel  channel  model;  second,  some  of  the  results  obtained  here  will  apply  to  single  chan¬ 
nels  with  memory. 

We  may  define  channels  with  memory  to  correspond  to  our  MC  channels:  a  channel  with  no 
intersymbol  interference  (Nil  channel)  is  defined  as  one  for  which  given  any  integer  i,  1  i  <  N, 


^  p<yi . 

y.eY 

=  >i»y-  >!>•••>  y^Vx. ,...,  x.  x. x,j  .  (2-9) 

1-1  *  1+1 "  '  *^N  1  1-1  1+1  N  \  'f 

The  correspondence  between  Nil  and  MC  channels  is  made  clear  by  Theorem  2.1.  We  may  also 
define  the  time  analog  of  the  MS  channel:  let  A  be  a  set  of  channel  states;  P(^,(?  /n  )>  i  ^  Y^, 

T]  6  Xg,  O'  e  A,  a  conditional  probability;  a  ^  denote  the  channel  state  at  the  i^^  time  instant  and 
p(Qr^,  .  .  .  ,  a^)  the  probability  distribution  over  channel  state  sequences  of  length  N.  Let 


=  S  •••  1!  (yi/^^l'  ■■■ 

A  ♦  1  L 

O'  .  fc  A  o,,.e  A 

1  N 

A  channel  with  conditional  probability  distribution  given  by  Lq.  (2-10)  is  an  MST  channel. 

Now,  all  MST  channels  are  Nil  for  the  same  reason  that  MS  channels  are  MCL  All  N  II 
channels  are  not  MST.  Basically,  the  same  counterexample  which  was  used  to  show  that  all  AlC 
channels  are  not  MS  can  be  used  here.  (See  Appendix  A.) 

For  N  II  channels,  if  k,i  are  chosen  so  that  1  k  <  i  ^  N,  then  p(yj^,  •  •  •  ,  •  •  •  » 

independent  of  the  input  distribution  p(x^,  .  .  .  ,  x^)  and  is  characteristic  of  the  channel  alone.^ 
Thus,  for  purposes  of  block  coding,  we  may  take  N  as  the  block  length  and  obtain  a  sufficient 
and  unique  characterization  of  the  channel.^ 

For  Nil  channels,  we  may  also  define  stationarity .  An  Nil  channel  is  stationary  if  for  any 
integers  j,  k,  and  satisfying  0  ^  j  <  j  +  i  N,  0  ^  k  k  +  i  N  we  have 

p<yj+i*  •  •  •  ’  . V’ "  •  •  •  ’  yk+i/^k+i’  •  •  • '  ^k+i> 

whenever 

yk+i  =  yj+i 

and 


X  '  .  =  x .  . 

k+i  j+i 


1  <  i 


(2-11) 


t  This  statement  will  appear  subsequently  in  its  “parallel"  form.  It  may  be  proved  by  applying  Eq.  (2 ”9)  and 
Theorem  2. 1  to  the  calculation  of  p(y|^, . . .  /XiAk/  •  *  •  r^i)* 

t  Since  no  stationarity  condition  has  been  imposed,  there  is  no  guarantee  that  the  channel  will  remain  the  same 
from  block  to  block. 
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In  other  words,  only  the  length  and  values  of  an  input-output  sequence  matter,  not  the  starting 
point. 

For  MST  channels,  if  the  distribution  of  cv/s  is  stationary,  then  the  channel  will  be  stationary 
as  well. 

C.  MUTUAL  INFORMATION  AND  CAPACITY 

We  now  wish  to  examine  the  effect  of  subchannel  dependencies  on  the  mutual  information 
between  input  and  output  of  a  channel  with  independent  subchannel  inputs,  and  to  compare  this 
mutual  information  with  the  mutual  information  between  input  and  output  of  the  individual  sub¬ 
channels.  A  comparison  is  implied  between  an  original  channel  with  dependencies  and  a  derived 
channel  without  them.  Suppose  we  are  given  a  channel  with  subchannel  inputs  x^,  .  .  . ,  sub¬ 
channel  outputs  y^,  .  .  . ,  y^j,  and  conditional  probability  distribution  p(y^,  .  .  . ,  y^^/x^,  .  .  .  , 

Suppose,  too,  that  we  are  given  an  input  probability  distribution 

IVl 

i-l 

where  p.(|  )  is  a  single -subchannel  input  distribution,  i  1,  .  .  .  ,  M.  Furthermore,  let  and 

^  X. 

th  ^ 

denote  summation  over  all  but  the  i  input  and  output,  respectively.  Then,  the  single- 

Y. 

1 

subchannel  conditional  probability  distributions  p^(y^/x^),  1  i  M,  are  given  by 

^  p<yi . . . 

^  X.  Y. 

1  1 

We  define  a  dependence -removed  (DR)  channel  with  conditional  probability  distribution  given  by 

M 

Pl)R^yi'---’ n  •  (2-^3) 

i  1 


Theorem  2.2. 

Taking  the  usual  definition  of  mutual  informationl^ 

M 

. -V  yy) 

i-l 


(2-14) 


If  we  denote  the  mutual  information  between  input  and  output  of  the  dependence -removed  channel 

. -\r  - 


IM 

V  I(X.;  Y.).Ij^,^(X^,...,X,,, 
i-l 


(2-lS) 


t  The  use  of  X|  ar  Y; ,  i  =  1 , . .  . ,  M,  as  an  argument  af  the  Infarmatlanal  expressions  impi  ies  that  on  expression 
which  is  o  function  of  x.  or  y.  is  overoged  over  or  Y^. 
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Proof, 


First  we  note  that 


It 


I(X.;  Y.)  =  H(X.)  -  H(X/Y.) 


i  =  1,  .  .  .  ,  IVI 


(2-16) 


and 


I(X^,  ....  Xj^,;  Y^,  .  .  .  ,  Yj^^)  =  H(X^ . X^^)  -  H(X^,  ....  X^^/Y^,  .  .  .  ,  Yj^) 


(2-17) 


These  expressions  hold  for  both  discrete  and  continuously  distributed  variables.  Since  the  sub- 

2 

channel  inputs  are  independently  distributed. 


IVI 


. Xj^j)=  H(X.) 


i-1 

We  have  also  that 

"<’=1 . V''l . ■'m'  '  . '’m'  *  . 

. . 

3 

Since  for  any  random  variables  U,  V,  W, 

n(u/vw)  <  n(u/v) 

we  obtain  the  inequality 

M 

H(Xj . Xj^/Y^ . Yj^^)«  Z  H(X/Y.)  . 

i  1 

Hence,  combining  Eqs.(Z-16)  through  (2-19),  we  get 

IVI 

"^1 . -''m'  '^1 . 

i-l 


(2-18) 


(2-19) 


as  required.  The  proof  of  Fq.  (2-15)  is  an  immediate  consequence  of  the  definition  of 

Note  that  for  an  IVIC  channel  the  values  of  p-(y^/x.)  computed  from  Eq.  (2-12)  are  independent 
of  the  input  distribution  p(x^,  .  .  .  ,  x^).  Hence,  corresponding  to  each  IVIC  channel  there  is  a 
unique  dependence -removed  channel.  This  is  not  true,  in  general. 

In  the  remainder  of  this  report,  we  will  have  frequent  need  to  compare  constants  defined  as 
maxima  of  functions  of  several  variables  and  to  compare  functions  of  one  variable  defined  as 
maxima  over  the  remaining  variables  of  functions  of  several  variables.  This  comparison,  which 
will  usually  take  the  form  of  an  inequality  between  two  non-negative  quantities,  will  t)e  facilitated 
by  the  two  theorems  which  will  be  stated  below.  First,  it  will  be  necessary  to  explain  some 
notation  and  give  a  definition. 

A  probability  measure  over  an  input  space  consisting  of  a  finite  number  K  of  points  can  be 
represented  as  a  vector  p  in  K-dimensional  Euclidean  space  . 


t  Numbered  references  appear  at  the  end  of  each  chapter. 
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Suppose  we  have  a  parallel  channel  consisting  of  M  subchannels .  If  p.(x.),  i  -  1,  .  .  .  ,  M 

M 

are  probability  distributions  over  X^,  then  n  p.(x.)  is  a  product  distribution  over  X  =  . 

i=l 


Theorem  2.3. 


Let  P  be  the  set  of  probability  distributions  over  a  finite'  product  space  X  ,  and  D  the 

]\1  ® 
set  of  product  distributions  over  X^  .  Let  p  and  R  be  real  variables  with  0  ^  p  1,  and 

p  €  P.  Let  f  and  g  each  be  continuous  real  valued  functions  of  p,  p,  and  R.  Define 

K  (R)  max  f{p,p,R) 

^  O^p^l 

p  cD 


G  (R)  =  max  g(p,  p,  R) 
O^p^l 

p  cD 

F  (R)  max  f(p,  p,  R) 
^  O^p^l 

p  cP 

Gp(R)  =  max  g{p,p,  R) 
0-^p^l 

p  eP 


Then, 


(1)  Fp(Il),  Gj^{R),  Fp(R),  and  Gp(R)  are  all  finite. 

(2)  Suppose  f  g  for  all  0  ^  p  ^  1  and  p  c  D;  then 

^  Op(H)  Gp(R)  (2-20) 

and  Fp{R)  <  if  ^  ^  g  fo^  all  0  ^  p  1  and  p  c  D. 

(3)  Suppose  f  g  for  all  0  ^  p  <  1  and  p  e  P;  then  Kq.  (2-20)  holds  and, 
in  addition. 


Fp{R)<  Fp(R)^Gp{R)  . 

l^urthermore,  Fp{R)  <  Gp(R)  if  f  <  g  for  all  0  p  1  and  p  c  P. 


(2-21) 


The  proof  is  given  in  Appendix  B. 

Theorem  2.4. 

Let  P  be  the  set  of  probability  distributions  over  a  product  space  X  and  D  the  set  of 

IVl  ^ 

product  distributions  over  X^  .  Let  p  and  R  be  real  variables  with  0  <p  ^  1  and  p  c  P.  Let 
f  and  g  each  be  real  valued  functions  (functionals)  of  p,  p,  and  R.  Define 


t  This  implies  that 


is  a  finite  set,  and  M  a  finite  positive  integer. 
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F^(R)  -  l.u.  b.  f{p,p,  R) 

^  04  p4  1 

peD 

G^(R)  -  1.  u.  b.  g(p,p,  R) 

^  04p4i 

peD 

Fp(R)  =  1.  u.b.  f(p,p,  R) 

0^p<l 

peP 

Gp(R)  =  1.  u.  b.  g(p,  p,  R) 

O^p^l 

p€P 

Then, 

(1)  If  f  g  for  all  0  ^  p  ^  1  and  p  e:  D,  we  have 

Fj^(R)<Gj^(R)  ^Gp(R)  .  (2-22) 

(2)  If  f  g  for  all  0  ^  p  ^  1  and  p  c  P,  Eq.  (2-22)  holds  and,  in  addition, 

Fp(R)  <  Fp(R)  ^  Gp(R)  .  (2-23) 

The  proof  is  given  in  Appendix  B. 

Theorem  2.5. 


Suppose  we  are  given  an  MC  channel,  with  finite.  Denote  the  capacity  of  the  dependence - 
removed  channel  by  the  capacity  of  the  i^^  subchannel  [with  conditional  probability  distri¬ 

bution  given  by  Eq.  (2-12)]  by  C  and  the  capacity  of  the  original  channel  (consisting  of  M 
subchannels)  l:)y  C.  Then, 


M 


c:  ^ 


^  i  ^  I)R 


i  1 


(2-24) 


Proof. 

Since  the  p.(y./x.)  are  unique,  the  verification  of  Eq.  (2-24)  is  straightforward.  First,  the 

^  ^  ^  4 

capacity  of  the  dependence -removed  channel  is  achieved  with  independent  subchannel  inputs 

(i.e.,  a  product  distribution  maximizes  the  mutual  information).  Hcncc,  the  equality  part  of 
Eq.  (2-24)  comes  from  Eq.  (2-15).  Let  us  use  a  superscript  p  to  make  explicit  the  input  dis¬ 
tribution  which  is  involved  in  the  calculation  of  mutual  information  between  input  and  output  of 
our  channel.  Then,  since 

C  =  max  . X^,;  . Yj^,) 

P  t  P 

and 

^  DR  '  ■  ■  ■  ’  ^  1’  •  •  •  ’ 

p  CD 

Eqs.(2-14),  (2-15),  and  Theorem  2.3  give  us  the  inequality  part  of  ICq.  (2-24). 
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Theorem  2.6. 


Suppose  we  are  given  an  MC  channel.  Let  C,  and  C.  be  as  above.  Then,  Eq.  (2-24) 

holds. 

Proof. 

The  proof  is  analogous  to  that  of  Theorem  2.5.  Theorem  7.2.2  replaces  Theorem  4.2.1  in 
Ref.  4,  and  our  Theorem  2.4  replaces  Theorem  2.3. 


Hence,  whether  or  not  is  finite,  the  capacity  of  an  MC  channel  cannot  be  increased  by  removal 
of  its  dependencies. 

Obviously,  this  result  applies  to  Nil  channels  as  well.  However,  if  it  is  the  capacity  per 

use  c  of  an  Nil  channel  that  we  are  concerned  with,  conditional  probability  distributions  arc 
N  N 

defined  on  x  for  all  positive  integers  N,  and  we  define 


and 


C 


=  lim 


DR 


N 


Then, 

N 

c  ^  ^  E  C.  (2-25) 

°°  i=l 

so  that  the  capacity  per  use  of  an  Nil  channel  is  not  decreased  by  removal  of  its  dependencies. 
If  the  channel  is  stationary,  the  middle  expression  in  Eq.  (2-25)  is  simply  the  capacity  for  one- 
shot  use  of  the  channel. 

To  abbreviate  the  description  of  channel  examples  in  the  remainder  of  this  report,  we  will 
agree  to  call  an  MC  channel  with  M  subchannels,  X^  -  {l,  .  .  .  ,  l}  and  "  (l,  •  •  •  an 

M  X  L  X  Q  channel. 

Both  the  inequality  of  Eq.  (2-14)  and  the  inequality  part  of  Eq.  (2-24)  may  be  strict.  This 
can  be  shown  by  the  following  example  of  a  2  X  2  X  2  channel: 


00 

01 

10 

11 

00 

5/8 

1/8 

1/8 

1/8 

01 

1/8 

00 

1/8 

1/8 

10 

1/8 

1/8 

00 

1/8 

11 

1/8 

1/8 

1/8 

00 

IT) 
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This  example  is  as  well  as  IVT".  Capaeity  (0.45Z  bit)  is  aehieved  by  the  following  input  clis- 

tributiont  p{x^,  x^): 

p(00)  -  p{0l)  p(10)  :  p(ll)  1/4  .  (2-26) 

The  dependence -removed  channel  is: 


00 

01 

10 

11 

00 

9/16 

3/16 

3/16 

1/16 

01 

3/16 

9/16 

1/16 

3/16 

10 

3/16 

1/16 

9/16 

3/16 

11 

1/16 

3/16 

3/16 

9/16 

Ci^j^(0.378  bit)  is  also  achieved  by  the  input  distribution  Eq.  (2-26).^  Hence,  the  inequality  part 
of  Eq.  (2-24)  may  be  strict.  Since  Eq.  (2-26)  is  a  product  distribution,  both  C  and  are 

achieved  with  independent  inputs.  Thus,  the  hypothesis  for  Theorem  2.2  is  obeyed  and  the  in¬ 
equality  in  Eq.  (2-14)  may  be  strict  as  well.  A  continuity  argument  shows  that  l’]q.  (2-14)  may 
still  hold  for  dependent  subchannel  inputs. 

The  example  we  have  been  discussing  may  also  be  used  to  show  that  for  de[)endent  subchannel 
inputs  neither  Eqs.(2-14)  nor  (2-15)  need  hold.  For  the  input  distribution  p(x^,  x^)  given  by 

p(00)  --  p(ll)  1/2 

p(01):.p(10)  0 

w^e  have 


I(X^,  X^;  V^)  0-2^2 


2 

X  1(X.;  V.)  -  0.379  bit 
i-  1 


I 


^2’ 


Yi,  V^)  =  0.329  bit 


One  might  be  tempted  to  conjecture  that  an  ME  channel  always  achieves  capacity  for  inde¬ 
pendent  subchannel  inputs.  The  following  example  of  a  2  X  2  x  2  channel  disproves  this  conjecture 


XiX^ 


00 

01 

1  0 

11 

00 

0.5 

0 

0 

0 

01 

0 

0.5 

0 

0 

10 

0 

0 

0.5 

0 

11 

0.5 

0.5 

0.5 

1 

p(yiy2/x^x2) 


t  See  Example  1  in  Chapter  3,  p.  17. 
t  Theorem  4.5.  1  of  Ref.  4  provides  the  means  of  proof. 
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One  may  verify  that  this  is  an  MC  channel  (it  is  A1S  as  well).  Capacity  (0.8  06  bit)  is  achieved 
only  by  the  following  distribution^  P(x.^»  ^2^* 

p(00)  =  p(01)  =  p(10)  =  2/7  ;  p(ll)  =  l/7  . 

Since  pj(x^)  and  p^ix^)  are  both  given  by  p(0)  =  4/7  and  p(l)  =  3/7,  capacity  is  not  achieved  by 
independent  subchannel  inputs. 

Since  Theorems  2.Z  and  2.5  deal  with  channels  without  a  prescribed  state  structure,  they 
naturally  have  nothing  to  say  about  the  situation  where  the  channel  state  is  known  to  the  receiver. 
Suppose,  however,  we  are  given  an  MS  channel.  If  we  consider  the  "output"  in  the  state  known 
case  to  be  a  doublet  (y,  n )  t  (Y,A  ),  then  we  see  that  this  is  just  a  special  case  of  the  general 
channel  with  state  unknown  to  the  receiver.  Furthermore,  since  the  channel  state  is  independent 
of  the  input  and  the  conditional  probability  distribution  corresponding  to  a  single  channel  state 
(a  product  distribution)  satisfies  the  IVIC  constraints,  the  channel  with  doublet  output  is  AlC . 
Hence,  Theorems  2.2  and  2.5  hold  for  AIS  channels  whether  or  not  the  channel  state  is  known  to 
the  receiver.^ 

We  conclude  Chapter  2  with  a  theorem  which  applies  only  to  the  situation  where  the  channel 
state  is  known  to  the  receiver. 

Theorem  2.7. 

For  an  AIS  channel  whose  state  is  known  to  the  receiver  during  each  transmission,  the 
channel  capacity  is  equal  to  the  sum  of  the  individual  subchannel  capacities. 

Proof. 


If  the  receiver  knows  the  channel  state,  the  applicable  conditional  probability  for  the  channel, 
corresponding  to  the  state  a,  is 

1  M 


Thus,  the  AIS  channel  with  state  known  at  the  receiver  is  already  a  dependence -removed  channel. 
We  have  from  Eq.  (2-15) 


A1 

V 

i-1 


O'  . 

I  *(X.;  Y.)  -  I  ^,,(X,, 

p?  '  1  1  p'-*'  1 


(2-27) 


where  the  channel  state  o',  and  input  (product)  distribution 


M 

n  p[(Xi) 

i=l 


are  now  both  made  explicit  parameters  of  the  informational  expressions.  Averaging  over  the 
channel  states,  we  have 


t  Theorem  4,5. 1  of  Ref.  4  shows  thot  the  distribution  given  yields  copocity.  Corollory  2  to  this  theorem  stotes 
thot  there  is  o  unique  output  distribution  corresponding  to  copocity.  Since  the  transition  motrix  for  the  chonnel 
is  nonsingulor,  the  input  distribution  must  be  unique  os  well. 

t  There  is  no  difficulty  posed  by  the  foct  thot  we  use  on  ougmented  output  in  our  definition  of  mutuol  informotion. 
Since  chonnel  stote  is  independent  of  input,  l(X;  YA^)  =  l(X;  Y/A^), 
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V 


O'  .  cA 
1 


M  ^ 
i=l  ^ 


p(^)l"(X^ - X 


^v 


Y  Y  ) 


O'  cA 


M 


(2-28) 


which  becomes 


M 

i—j 

i-i 


a  . 
1 


Pi(«i)Ip,;(Xj;  Y.)  = 


1  p(«)  l,y;.  (X^, 


O' .  eA 
1 


ae  A 


M 


,X,i;  Y^, 


I\1  * 


(2-29) 


Given  a  dependence -removed  channel,  the  mutual  information  between  input  and  output  for 
any  joint  distribution  on  the  subchannel  inputs  is  never  greater  than  that  corresponding  to  the 
product  distribution  with  the  same  single -subchannel  marginal  distributions  as  the  original  joint 
distribution.^  Hence,  for  the  purpose  of  discussing  capacity,  wc  need  only  consider  independent 
subchannel  inputs.  The  capacity  C  of  the  MS  channel  with  state  known  at  the  receiver  is  then 
obtained  by  maximizing  the  RHS  of  Eq.  (2-29)  over  all  product  input  distributions  p'*'  .  Hence, 


C '  =  max 

p- 


p(a)l  ,(X^,...,X, 


M' 


a  cA 


M 


M 

V  ^  i 

=  max  p.(a.)  1  ,.(X.;  Y.) 

P  i- 1  O' . e A  ^ 

1 


M 

y 

i-l 


max 


p.(0'.)I  i(X.;  Y.) 

^  ^11  pr  1  1 


O' .  c  A 
1 


(2-30) 


But,  the  last  expression  in  Eq.(2-30)  is  clearly  just  a  sum  of  individual  subchannel  capacities 
C|,  with  the  state  known  at  the  receiver.  Hence, 


M 


i=l 


(2-31) 


as  required. 
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CHAPTER  3 

STATE  REPRESENTATIONS  AND  BOUNDS 
FOR  MUTUAL  INFORMATION  AND  PROBABILITY  OF  ERROR 

A.  STATE  REPRESENTATIONS 

The  formula  for  the  conditional  probability  of  the  output  y^,  •  •  •  >  of  channel  given 

the  input  x^,  .  .  .  ,  is  given  by  Eq.  (2-7)  which  is  rewritten  below. 


P'^'i . 


'M' 


N’  ...  S' 

p(  cv  ^  ,  .  .  . 

X  p„  (y/x^)  ••• 

(  3-1) 


The  definitions  of  Chapter  Z  apply  to  all  the  expressions  in  this  formula  (see  p.  6).  Suppose  we 
are  given  a  conditional  probability  distribution  p(y^,  .  .  .  ,  •  •  •  »  which  can  be  expressed 

in  the  form  of  the  RHS  of  Kq,(3-1)  and  is  therefore  the  conditional  probability  distribution  as¬ 
sociated  with  an  MS  channel.  Is  the  representation  unique  or  are  there  other  sets  of  subchannel 
conditional  probabilities  y  r  r,  and  probability  distributions  p(y^,  .  .  .  .Tjyj)  such  that 


. x^)  = 


Tier 


S' 

LJ 


p<yi-  •  •  •  ’ I'm' 


X  r  (y^x^)  •••  r  (y^/x^^) 


M 


The  answer  to  this  question  is  that  the  representation  Eq.(3-1)  is  not,  in  general,  unique.  This 
may  best  be  shown  by  an  example. 


Example  1 

Eet  p^{^/ri)  and  p^ii/v)  be  given  below: 


Let  p(a^,  be  given  by 

p(l,l)  =  p(2,  2)  =  1/2  p(l,2)  =  p(2,  1)  =  0  . 

Then  the  representation  for  piy^y^/x^x^)  can  itself  be  represented  by  the  diagram: 
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(1.1) 


(2,2) 


P(  a  ) 


1  /2 


1/2 


0 


0 


X 

1 


1 


P-(y/x) 


0 


0 


X 


2 


1 


1 


1/2 


1/2^ 

X 

1/2 

1/2 

1/2^ 

X 

X 

1/2 

pCy^y^/x^x^)  can  be  represented  by  the  matrix 


00 

01 

10 

11 

00 

5/8 

1/8 

1/8 

1/8 

01 

1/8 

5/8 

1/8 

1/8 

10 

1  /8 

1/8 

5/8 

1/8 

11 

1  /8 

1/8 

1/8 

5/8 

Let  P(y^,y2)  be  given  by 

p(l,  3)  -  3/4 
p(2,  4)  -  1/4 

P(y|,  y2)  =  0  unless  (y  ^  ,  y^)  ( 1 ,  3)  or  ( 2,  4 ) 
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The  diagram  for  the  representation  is; 


One  may  check  directly  that  this  new  representation  yields  the  same  probability  distribution 
p(y^y2/x^ x^),  and  hence  the  same  matrix,  as  is  given  above. 

To  conclude  this  example,  we  give  yet  another  representation  for  the  P(y given  by 
the  matrix  above.  Let  s^{^/ri)  and  s^{^/ri)  be  given  below: 


0 


0 


1 


1 


Let  p(/?^,/?2)  be  given  by 

p(l,  1)  =  5/8  p(i,  2)  ^  p(2,  1)  =  p(2,  2)  =  1/8 

The  diagram  for  the  representation  is  given  below: 
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We  note  that  the  channel  states  used  here  are  "pure”  channels,  i.e.,  given  the  inj^ut  and  the 
channel  state,  the  output  is  completely  determined.  The  "burden  of  randomness"  is  placed  en¬ 
tirely  on  the  channel  state  probability  distribution.  Thus,  in  some  sense,  this  last  representation 
is  a  "canonic"  representation.  The  existence  of  canonic  representations  is  not  peculiar  to  the 
channel  example  given  above.  P^or  any  MS  channel  with  finite  input  and  output  alphabets,  there 
always  exists  a  representation  for  which  in  each  channel  state  the  subchannel  conditional  prob¬ 
abilities  are  ajl  either  0  or  1,  and  hence  for  which  the  channel  state  probability  distribution 
supplies  all  the  randomness.  This  fact  is  proven  in  Appendix  I).  There  is  often  more  than  one 
canonic  representation  for  a  given  MS  channel. 

B.  ENTROPY 

The  notation  of  channel  representation  leads  naturally  to  the  idea  of  channel  entropy.  If  the 
input-output  statistics  of  a  channel  arc  given  l)y  an  expression  of  the  form  in  ICq.(3-l),  one  miglit 
wish  to  define  the  entropy  11  of  the  channel  representation  by  the  formula^ 


(3-2) 


If  we  compute  the  entropies  of  the  three  representations  in  P^xample  1,  we  obtain,  in  order, 


I  logz  I  ~  ?  ^"§2  4“  " 


IIj  8  ^°^2  8  8  ^°®2  8  1-549  bits 


Thus,  the  entropy  of  a  channel  representation  is  not  determined  by  the  channel's  input-output 
conditional  probability  distribution  alone.  Thet'efore,  we  may  not  simply  associate  the  quantity 
given  by  10q.(3-2)  with  the  c'ntropy  of  the  channel.  We  note,  however,  that  the  entropy  is  both 


Hence,  among  all  representations  of  the  channel. 


non-negative  and  continuous  in  p(a\j 


there  must  be  at  least  one  which  gives  a  smallest  value  for  the  eiitropy  of  tlie  representation. 

Despite  the  problem  with  uniqueness,  th('  entropy  of  a  channel  representation  is,  in  some 
instances,  a  simple  and  natural  quantity  to  use  in  bounding  the  mutual  information  between  its 
input  and  output.  This  fact  will  be  demonstrated  in  the  sequel. 

C.  NATURAL  STATE  REPRESENTATIONS 

It  should  be  clear  at  this  point  that  it  is  impossible  to  decide  which  channel  rc'presentation 
is  a  "natural"  one  from  the  input-output  probabilities  alone.  The  naturalness  of  a  channel  repre¬ 
sentation  will  depend  on  the  relationship  between  the  states  a  of  the  mathematical  model  and 
the  processes  which  take  place  in  the  physical  channel.  The  choice  of  a  natural  state  represen¬ 
tation  is  important  because  we  will  often  talk  about  the  situation  where  the  receiver  has  knowledge 
of  the  channel  state.  If  the  representation  is  natural,  this  knowledge  may  usually  be  obtained 


t  We  will  limit  our  discussion  of  channel  entropy  ond  its  properties  to  coses  where  the  stote  distribution  is  discrete. 
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through  measurement  of  some  physical  quantity.  CJenerally,  the  models  we  shall  use  (e.g.,  the 
first  representation  of  1‘^xample  1)  are  natural  ones  for  a  set  of  fading  subehannels  with  equal 
energy  orthogonal  signaling  on  each  subchannel.  The  observable  which  the  receiver  may  use 
to  obtain  state  (corresp{)nding  to  depth  of  fade)  knowledge  is  received  signal  energy. 

D.  BOUNDS  ON  MUTUAL  INFORMATION 

We  now  proceed  to  derive  some  relations  involving  the  mutual  information  between  the  input 
and  output  of  a  channel  with  discrete  states.  The  channel  is  not  necessarily  MS. 

Theorem  3.1. 

Suppose  we  have  an  input  I'andom  v'ariable  x  which  may  take  on  values  in  a  space  X,  an 
output  random  variable  y  which  may  take  on  valui^s  in  a  space  Y,  and  a  discrete  c  ollection  G 
of  channels  g,  each  with  input  alphabet  X  and  output  alphabc't  ^  .  If  probal^ility  distributions 
are  given  over  G  and  X,  and  x  and  g  arc  indepcmdcuit,  then 

I(X;  V 'G)  -  1(X;  V)^1(X;  Y/C',)  (3-3) 

where  ll((l)  is  the  entropy  of  the  probability  distribution  over  Cl.  If  G  is  not  discrete,  the  right- 
hand  inequality  in  Kq.(3-3)  still  holds.  The  situation  is  shown  schematically  in  Fig.  5. 


|5>»-T214(1)| 


Fig.  5.  General  channel  with  state  structure. 


Proof. 

1(X;  YG)  ^  1(X;  Y)  +  1{X;  G/Y) 

1(X;  YG)  =  1(X;  G)  +  I(X;  Y/G) 

But  I(X;  G)  =  0,  since  x  and  g  are  independent.  Thus, 
1(X;  Y/G)  =  1(X;  Y)  +  I(X;  G/Y) 


Also, 


0<I(X;  G/Y)<H(G) 


(3-4) 

(3-5) 


The  right-hand  inequality  in  Fq.(3-5)  holds  if  G  is  discrete.  Combining  Eqs.(3-4)  and  (3-5), 
we  have  Eq.  (3-3). 
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Now  we  may  interpret  Eq.{3“3).  In  the  first  place,  g  represents  a  "channel  state"  in  just 
the  sense  we  have  been  using  this  term.  Hence,  the  rightmost  inequality  of  l\:q.(3-3)  implies 
that  knowing  the  channel  state  increases  the  mutual  information  between  input  and  output.  It  is 
a  somewhat  disguised  form  of  the  statement  that  mutual  information  is  a  conv^ex  downward  func¬ 
tion  of  the  channel  transition  probabilities. 

From  our  assumptions,  it  is  clear  that 

p(y/x)  =  ^  p(g)  Pg(y/x)  ^  ^  p(g)  p(y/-'<g)^ 

gcG  geC. 

Hence,  H(G)  is  the  entropy  of  a  channel  representation,  and  the  leftmost  inequality  of  l^q.(3-3) 
states  that  we  need  subtract  only  this  entropy  from  the  upper  bound  to  1(X;  Y)  to  obtain  a  lower 
bound.  H(G)  is  thus  a  measure  of  the  tightness  of  the  bounds. 


E.  RANDOM  CODING  BOUND 

Coding  is  a  subject  we  have  not  discussed,  as  yet.  I'\^r  a  discrete  channel  without  parallel 
structure,  a  random  coding  bound  is  derived  by  choosing  a  probability  distribution  over  all  input 
letter  sequences  of  length  N,  picking  the  requisite  number  of  code  words  independently  at  random 
according  to  this  distribution,  computing  an  upper  bound  to  the  probability  of  error  given  that 
a  particular  message  is  transmitted,  and  averaging  over  the  ensemble  of  possible  codes.  Since 
the  bound  thus  obtained  is  independent  of  the  particular  message  chosen,  it  is  a  bound  to  the 
average  probability  of  error  for  the  code. 

1 1 

In  the  discussion  to  follow,  the  bounding  technique  of  Gallager  will  be  used;  we  shall  state 

N 

some  of  his  results  below.  First,  we  must  givt'  the  notation  and  assumptions.  Let  X  be  the 

set  of  all  sequences  of  length  X  that  can  be  transmitted  on  a  given  channel,  and  let  be  the 

N  N 

set  of  all  sequences  of  length  N  that  can  be  received.  We  assume  that  both  X  and  V  are  finite 

sets.  Let  p(y/x),  for  y  f  and  x  e  X^,  be  the  conditional  probability  of  receiving  sequence' 

y  given  that  x  was  transmitted.  We  assume  that  we  have  a  code  consisting  of  W  code  words, 

that  is,  a  mapping  of  the  integers  from  1  to  W  into  a  set  of  code  words  x  .  .  .  ,  X-  ,,  wliere 
—  N  1  \\ 

X  c  X  ;  1  <  m  <  W.  We  also  assume  that  maximum-likelihood  decoding  is  performed  at  the 

^  -  N  ~ 

receiver.  Finally,  we  define  a  probability  measure  p(x)  on  X  and  use  denote  the  average 

over  tlie  ensemble  of  codes  of  the  probability  of  error,  given  that  the  m^^  code  word  was 

transmitted. 

2 

Now  we  state  the  following  result  of  Gallager, 


P  <(W-1) 
cm 


P 


,.N 


1  i+P 


^  p(x)p(w/?)'''"^ 


yfY  xrX 


X  N 


(3-6) 


for  any  p,  0  p  <  1 . 

If  we  make  some  further  assumptions,  we  can  simplify  the  bound  of  ICq.  (  3-6).  Let  x^,  .  .  .  ,  x^ 
be  the  individual  letters  in  an  input  sequence  x,  and  let  y^,  .  .  .  ,  be  the  letters  in  an  output 


t  Here  and  in  the  remolnder  of  this  report,  we  will  freely  use  the  nototionol  convention  that  p^(u/v)  =  p(u/vt). 
t  Numbered  references  oppeor  at  the  end  af  each  chopter. 
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sequence  y.  We  now  assume  that  the  channel  is  memoryless  and  time  invariant  so  that 

N 

P(>7x)  =  n 

n=  1 

and  that  the  probability  distribution  p(x)  on  input  sequences  factors  into  a  product  of  individual 
letter  probability  distributions  as  follows: 

N 

p(x)  =  n  p(x^)  .  (3-8) 

n-1 

Then,  a  bound  on  the  ensemble  probability  of  decoding  error  which  is  independent  of  the 
probabilities  with  which  the  code  words  are  used,  is  obtained  in  the  form 

^  expl-NE(R)|  (3-9) 


where  E'(R)  is  called  the  random  coding  exponent  and  is  defined  by  the  equations 


.1 


f-gip.  p)  =  -In 


^  p(k)  p(j/k)^/^"p 
k“  1 


(3-10) 


and 


K{H)  -  max  [-pH  +  V.^{p,p)] 
P*  P* 


(3-11) 


We  have  assumed  that  the  channel  input  alphabet  consists  of  the  integers  from  1  to  K,  anci  that 
the  channel  output  alphabet  consists  of  the  integers  from  1  to  J.  The  maximization  in  l''q.(3-ll) 
is  over  all  p,  0  p  1,  and  all  (input  letter)  probability  vectors  p.  H  is  th('  rate  in  natural 
units  (i.e.,  W  =  exp[NR]). 

Now,  the  question  arises  of  what  to  do  about  the  parallel  structure  of  a  channel  if  such  struc¬ 
ture  exists.  In  the  first  place,  Gallager's  bounds  in  I’]qs.(3-6)  and  (3-9)  apply  without  change  to 
a  channel  with  parallel  structure  if  we  understand  that  a  "letter”  (x^  or  y^^)  in  the  sense  used 
above  is  an  M -tuple  which  is  composed  of  the  inputs  to  or  outputs  of  the  M  subchannels  of  an 

arbitrar}^  M-input,  M -output  channel.  Then,  if  each  subchannel  input  has  an  !.  letter  alphabet 

M  M 

and  each  subchannel  output  has  a  Q  letter  alphabet,  K  -  L  and  J  .  W'e  note  that  these 

statements  do  not  depend  on  the  assumptions  of  Kqs.(3-7)  and  (3-8).  Although  Gallager's  bounds 
are  fully  applicable  to  the  situation  we  wish  to  study,  some  further  structure  will  have  to  be 
imposed  so  that  these  bounds  will  be  productive  of  insight  in  spite  of  the  additional  complexity 
of  our  channel  model.  Some  of  this  structure  is  already  implicit  in  our  iVlS  channel  model.  In 
addition,  we  will  make  notational  changes  which  will  facilitate  the  explanation  of  some  of  our 
results.  Let  the  rate  per  subchannel  H  be  defined  bv 

r  S  " 


R 


s 


Ji 

M 


Define 


D(R) 


(3-12) 


(3-13) 
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where  theiH'  ixrv  M  subchannels  anti  K(]M  is  j^uven  !)>■  I'q,(3-ll),  Then,  wo  have  from  10qs.(  3-13) 
and  (3-^)  that 


T  exp  ,  (3-1*4) 

*\ow,  l('t  us  considt'r  a  spt'cial  taist*.  Suppose  that  tin'  MS  (diaiinel  consists  of  M  identical 
and  intk'pt'ndc'nt  sul)channels.  ’I'hen,  ’I’heoi'em  S  of  ( i:  Hager ^  implit'S  tliat  may  he  fiuTluu' 

deconipostMl  so  that 

Mi:^(K^)  ( 3-is) 


whert' 


Q 

‘  I, 

^  pif  )  l)(q 

1* ,) 

f)\  In  ^ 

q  1 

.f  1 

and  tlie  maximi/,ation  is  jxu’for nitul  oxau*  all  (>,  0  :  p  •  1,  and  all  prol).abil ity  vt'ctors  p^  defined 

on  th('  subtdianntT  in})ut  alphal:)('t.  All  tht'  (juantitit's  in  Ilq.(  3-lb)  laT't'r  to  a  single  sul)channel, 
I'Hom  lAts.(  3-14)  and  (3- IS),  wc'  have 


H^^^^expl  NMi:^(H^)| 


^  3-17) 


It  should  l3('  strongly  tunpliasi/t'd  htua'  tliat  this  rt'sult  d(q)(mds  on  our  coding  simultaneously  in 
the  "paralltT"  and  "time"  dirt'ctions.  Act'ording  to  (iallagt'r’s  'I'heort'm  S,  ont'  of  the  conditions 
for  the  maximum  requirt'd  in  tht'  tlt'finit it)n  t)f  is  that  t'ach  subtdiannt'l  lettt'p  be  chost'n 

indept'ntlt'ntly  t)f  the  other  subtdiannt'l  letttu'S  at  that  instant  t)f  tinny  anti  intli'])t'ntlt‘ntly  of  all 
subchannel  It'tters  at  otht'r  instants  of  timt',  A  ctxlc  wt)rd  may  lie  thtuight  t)f  as  a  matrix  with 
M  rows  anti  N  columns,  llatdi  t'lt'mt'nt  of  tht'  mati'ix  is  a  It'ltt'r  in  tht'  subt'hanncl  in])iit  tilphabet, 
liat'h  column  t)f  the  matrix  is  a  lettt'r  in  the  (whtile)  t'hannel  input  alphabt't.  'rht*  btuintl  of 
lap  (3-17)  assumes  eatdi  matrix  t'lemt'nt  is  ('lit)St'n  intlept'ntlt'iitly  t)f  tht'  otht'i's,  CTt'arl\y  tau'h 
t)utput  wt)rd  may  also  bt'  thought  t)f  as  an  M  x  N  mati'ix  with  t'lemt'nt s  t'qual  tt)  U'ttt'rs  t>r  tht' 
subchannt'l  output  alphabt't. 

In  what  ft)llt)ws,  we  shall  gt'nt'rally  bt'  studying  MS  channtds  whost'  subtdiannt'ls  art'  not  in- 
tlepentlent,  although  the  MS  channt'l  itself  is  memoryless.  Thus,  wt'  shall  obtain  bounds  t)f  tht' 
form  of  Htp  (  3-14). 


F.  STATE  KNOWLEDGE  -  SOME  GENERAL  CONSIDERATIONS^ 

When  dealing  with  a  channel  that  has  a  state  structure,  ont'  naturally  expt'cts  that  knowletlgt' 
of  the  state  at  the  receiver  will  be  advantageous,  both  in  terms  of  increasing  tht'  capacity  and 
decreasing  the  prt)babilit\  t)f  t'rrt)r.  It  woultl  also  be  expt'ctt'tl  that  partial  knt)wletlgt'  of  the  state 
at  the  rt'ceiver  is  bt'tter  than  nt)  knowlt'tlge,  l)ut  not  as  gt)od  as  complete  knt)wletlg<'. 

In  tlealing  with  capacity,  wt'  may  work  dirt'ctly  t)n  the  mathematical  exprt'ssit)ns  invt)lveth 
The  situatif)n  with  regard  to  probability  tT  t'rrt)r  is  somewhat  difft'rent,  lit' re,  w<'  know  that 
rect'ivt'r  knt)wlt'dge  cannot  int'rt'ase  tht'  prtibability  t)f  ('rror  bi'caust'  the  I'ct'i'ivt'r  uses  this 


t  The  remarks  and  results  in  the  remainder  of  this  chapter  are  nat  limited  ta  channels  with  parallel  structure. 
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knowledge  optimally  (i.e.,  it  computes  the  likelihoods  of  code  words  based  on  what  state  knowledge 
it  has).  Since  one  of  the  options  available  to  the  receiver  is  to  ignore  any  state  knowledge  it  may 
have,  the  optimal  receiver  must  do  at  least  as  well  as  this.  This  same  inequality  must  apply 
to  the  ensemble  probability  of  decoding  error  because  it  applies  to  each  member  of  the  ensemble. 
However,  wo  do  not  generally  compute  this  probability  of  error;  we  compute  the  random  coding 
exponent  (RCE).  Wc  would  hope  that  an  inequality  between  ensemble  probabilities  of  decoding 
error  for  two  categories  of  receiver  state  knowledge  would  be  reflected  in  the  opposite  inequality 
between  the  corresponding  RCE's.  This  is  indeed  the  case,  but  it  is  necessary  to  pursue  the 
mathematical  properties  of  the  RCE  in  order  to  prove  it. 

Suppose  we  have  an  input  random  variable  x  which  may  take  on  values  in  a  space  X,  an 
output  random  variable  y  which  may  take  on  values  in  a  space  Y,  and  a  collection  G  of  channels 
g,  each  with  input  alphabet  X  and  output  alphabet  Y.  By  complete  receiver  knowledge  of  the 
channel  state,  we  mean  that  the  receiver  knows  g.  By  partial  receiver  knowledge  of  the  channel 
state,  we  mean  that  the  receiver  knows  some  observable  t  e  T^  which  is  related  to  g.^  We  shall 
assume  that  a  distribution  p{xygt)  onXxYXQxT  is  given,  and  that 

p(y/xgt)  =  p(y/xg)  (3-18) 

and 

p(gt/x)  =  p(gt)S  .  (3-19) 

The  first  assumption,  Eq.  (3-18),  is  consistent  with  our  terminology  of  "partial”  and  "com¬ 

plete"  knowledge,  i.e.,  once  g  is  known,  t  becomes  irrelevant  for  the  computation  of  p(y/x). 

The  second  assumption,  Eq.  (3-19),  is  equivalent  to  the  statement  that  the  pair  (g,  t)  conveys  no 
information  about  x.  Equations  (3-18)  and  (3-19)  taken  together  imply 

p(t/gyx)  =  p(t/g)  .  (3-20) 

This  assures  us  that  the  receiver  need  not  consider  t  if  it  knows  g  (see  Ref.  4). 

In  the  remarks  following  the  proof  of  Theorem  3.1,  it  was  noted  that  the  effect  of  state  knowl¬ 
edge  in  increasing  mutual  information  was  related  to  the  convexity  (downward)  of  the  mutual  in¬ 
formation  as  a  function  of  the  input-output  conditional  probabilities.  The  following  theorem  on 
convex  functions  will  be  useful  in  the  sequel. 

Theorem  3.2. 

Let  f  be  a  convex  downward  function  of  transition  probabilities  p(y/x),  and  assume  that  a 
probability  distribution  p(xygt)  on  X  x  Y  x  G  x  T  is  given  such  that  Eqs.(3-18)  and  (3-19)  are 
satisfied.  Then, 


t  None  of  the  spaces  X,  Y,  G,  ond  T  need  be  finite  or  even  discrete.  We  will  proceed  os  though  oil  the  spoces 
were  discrete,  ond  remork  thot  on  oppropriote  replocement  of  sums  by  Integrols  covers  the  other  coses,  until  we 
reoch  Theorem  3.6  which  requires  G  to  be  finite. 

t  For  exomple,  if  we  ore  deoling  with  o  single  fodlng  chonnel  with  binory  input  ond  output  olphobets  ond  equol 
tronsmitted  energy  ol lotted  to  0  ond  1,  g  would  be  the  bit  crossover  probobillty,  ond  t  would  be  the  energy  of  the 
received  woveform. 

§  Assumptions  in  Eqs.  (3-18)  ond  (3-19)  ond  oil  the  stotements  in  the  porogroph  preceding  them  sholl  be  in  effect 
for  the  remoinder  of  this  chopter. 
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f  lp(y  x))  4 


( ^-21) 


1  p(t)  f  lp(y  xt)l  <  ^  P(g)  f  |P(y/xg)l 

tfT  gcG 

(If  f  is  conv(^x  upward,  the  in(‘qualiti(*s  in  I’q.(3~Zl)  are  reversed.] 

Proof. 


p(  y  xt)  ^  ^ 

p(g  tx)  ply  gtx) 

gt(5 

t 

\' 

p{g/t)  ply  xg) 

( 3-22) 

gr(; 

from  Ilqs.(3-18)  and  (3-lh). 

A  Iso, 

p(y/x)  =  ^ 

p(y/xt)  p(t  A)  "  ^  p(y/xt)  p(t) 

<  3-23) 

tf‘r  i(T 

from  lOq.  (  3-1  9).  Hence, 


f|p<y/'x)|<  ^  p(t)  f  lp(y  xt)|  I  i-24) 

tc'r 

from  convexity  of  f  and  I‘]q.(3-23),  and 

flp(y'xt)K  ^  p(g/t)  f  hi^y  Xg)|  {3-2S) 

gcG 

from  convexity  of  f  and  I^qs.{3-22).  Inequality  (3-23)  implies 

^  p(t)  f  [p(y/xt)K  ^  1  P(t)  p(g,  t)  f  lp(y/xg)l 

tcT  tcT  g(('i 

P(g)  f  lP<y/xg)l  •  (3-26) 

gfG 

liquations  (3-24)  and  (3-26)  are  equivalent  to  Hq.(3-21). 

G.  STATE  KNOWLEDGE,  MUTUAL  INFORMATION  AND  CAPACITY 
Theorem  3.3. 

1(X;  Y)<1(X;  V/T)  <  I(X;  Y/G)  .  (3-27) 

Proof. 

Since  the  mutual  information  is  a  convex  downward  function  of  the  transition  probabilities, 
this  follows  directly  from  d'heorem  3.2. 

Theorem  3.4. 

Denote  the  capacity  of  the  channel  when  the  receiver  knows  neither  t  nor  g  as  C,  the 
capacity  when  the  receiver  knows  t  as  and  the  capacity  when  the  receiver  knows  g  as  C^. 
Then, 

C^C  .  (3-28) 

^  8 
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Proof. 


Since  }Cq.(3-27)  holds  for  all  input  distributions  p(x),  Theorems  2.3  or  2.4  give  us  Eq.(3“28). 

This  concludes  our  discussion  of  the  effect  of  channel  state  information  on  the  mutual  in¬ 
formation  between  input  and  output  and  on  capacity. 

H.  STATE  KNOWLEDGE  AND  RANDOM  CODING  EXPONENT  (RCE) 

We  shall  begin  this  section  by  deriving  some  mathematical  expressions  involved  in  the  def¬ 
inition  of  the  HCE  when  an  auxiliary  variable  v  e  V,  independent  of  the  input,  is  known  to  the 
receiver. 

We  note  that  the  RIIS  of  Eq.  (3-6)  is  independent  of  m.  Hence,  it  is  a  bound  on  the  ensemble 
probability  of  decoding  error  and  is  independent  of  the  probabilities  with  which  the  code  words 
are  used.  If,  during  an  input  sequence  of  length  N,  the  variable  v  assumes  the  values  v^,  .  .  . , 
then  we  assume  the  conditional  probability  relating  input  and  output  sequences  to  be  given  by 

N 

V.Sy/^)  n  Py  ■  (3-29) 

V  n 


To  obtain  the  HCE  [E^(R)]  corresponding  to  receiver  knowledge  of  v,  we  must  substitute  Eqs.  ( 3-8) 
and  (3-29)  in  Kq.  (3-6),  replace  (W  —  1 )  by  W  =  e  ,  av^erage  over  the  distribution^  of  v,  divide 
the  negative  of  the  natural  logarithm  of  the  result  by  N,  and  perform  a  maximization.  Thus,  if 
we  define 


"  N 

In 

pRN 

e^  _  P(v)  2.  ^ 

n 

L  vr\  yf^ 

- 

Ixc  \ 

.n-l 

"  N 

1/1 +p 

I4p 

n  p  (y  ^ 

A  ^ 

.  n-  1 

N  *  N 

where  is  the  set  of  all  sequences  of  v's  of  length  N,  v  c  \  ,  we  have 


(3-30) 


1l'^(R)  -  max  I'^^(p,p,R)  .  (3-31) 

p\p 


We  shall  assume 

N 

p(v)  -  Yi  p{v^)  .  (3-32) 

n-  1 

This  corresponds  to  the  assumption  of  time  invariance  and  memorylessness  if  v  is  the  state 
variable.  .Substituting  Kq.(3-32)  in  ]^iq.  (3-30)  and  reducing  the  result,  we  get 


t  We  will  now  ossume  v  discrete,  X  =  {1 , .  .  . ,  K},  ond  Y  =  {1 , . .  . ,  J}  for  purposes  of  nototion.  Simllor  results 
moy  be  obtoined  if  ony  or  oil  of  these  ossumptions  ore  dropped. 
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e'^(P,  p,  R)  =  -  ^  In 

ePR  V 

LJ 

P(v)  ^ 

1+p 

veV 

yfY 

.xeX 

Define 


Thus, 


e'"(p,  p,  H)  =  -pR  -In  p'^fp.p)  . 


We  may  also  define 


J 

K 

1+p 

E{p,  p,  R)  -  -pR  -  In 

E  Pfl^)  P( j/k)^^^'*'^ 

.k=  1 

and 


P'(p,  P) 


K 


^  p(k) 
k-1 


1+p 


J 

'  K 

1+p 

=  -pR  -  In  E 

P(v) 

V 

i—i 

E  p(k)  p^(j/k)^/^+P 

1  V€  V 

i=i 

.k=l 

J 

'  K 

1+p 

=  E  p(v>  E 

P<k)  Py(j/k)^'^^'^'’ 

veV  j=l 

.k=  1 

[  3-33) 


[3-34) 


(3-3S) 


(3-36) 


( 3-37) 


Thus,  we  have 


and 


E(p,p,  R)  =  -pR  -  In  F{p,p) 


E{R)  =  max  E(p,  p,  R) 

0^p4^1 

pcP 


< 3-38) 


(3-39) 


Now,  we  may  easily  show  that  E  is  a  convex  upward  function  of  the  conditional  probabilities 
included  in  its  definition.  Suppose 


P(j/k)  =  ^  P(v)  P  (j/k) 


(3-40) 


Hence, 


F{p,p) 


vcV 


K 

i—j 


K 

V 

t-j 

k=l 


p(  k) 


^  P'v)  p^(j/k) 
veV 


1/1+p 


1+p 


(3-41) 


By  applying  Eq.{C-3)  {Minkowski’s  inequality)  to  the  inner  two  sums  of  the  RHS  of  Eq.{3-41), 
we  get 
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f  ^ 

:::  pco 

Z  P(v)  P^(j/k) 

i/i+p 

lk=l 

.veV 

J 


j-1  veV 


K 


k=  1 


1+p 


"  K 

1+p 

^  p(  V) 

1 

- - ^ 

^  P(k)  p^(j/k)^/^+P 

vfV 

U  1 

-k^l 

as  required. 
Theorem  3.5. 


(3-42) 


If  K(R)  is  the  RCE  corresponding  to  no  state  knowledge  at  the  receiver,  E^(R)  and  E^(R) 
are  the  RCE's  corresponding  to  receiver  knowledge  of  t  and  g,  respectively,  and  Eqs.  (3-29) 
and  (3-32)  hold  for  t,  g,  or  a  blank  replacing  v,  then 


Proof. 


E(R)  <  K^R)  <  i:S(R) 


(3-43) 


By  the  convexity  upward  of  F  and  Theorem  3.2,  we  have 
E®(P,  P)  <  l'\p,  P)  <  !•  (P,  P)  . 

Hence, 


-pR  -  In  E(p,  p)  <  -pR  -  In  k\p,  p)  <  -pR  -  In  K®(p,  p) 

By  F"qs.(3-35)  and  (3-38),  wo  have 

E(p,  p,  R )  <  K^p,  p.  R )  <  l':S( p,  p’  R )  . 

Our  result  follows  from  Eqs.  (3-31),  (3-39),  and  Thc'orem  2.3  (or  rhoorem  2.4). 


It  should  be  emphasized  that  not  only  must  the  state  variable  g  and  partial  knowledge  variable 
t  satisfy  Eqs.  (3-18)  and  (3-19),  but  successive  values  of  g  and  t  must  be  independent  and  iden¬ 
tically  distributed.  In  addition,  the  single-letter  conditional  probability  of  the  channel  must 
depend  only  on  the  value  g  or  t  assumes  during  the  transmission  of  a  single  letter.  If  all  these 
assumptions  hold,  we  shall  say  that  the  channel  with  complete  or  partial  state  knowledge  is  still 
memoryless  and  time  invariant. 

We  avoided  making  these  additional  assumptions  in  Theorems  3.3  and  3.4,  but  the  results 
there  are  "one-shot"  results.  If  the  additional  assumptions  are  made,  the  results  become  "per- 
transmitted-lctter"  results  as  well. 

We  have  devoted  a  fair  amount  of  space  to  showing  that  state  knowledge  increases  the  R('’]C, 
a  result  which  is  analogous  to  the  result  that  state  knowledge  increases  mutual  information.  In- 
eluded  in  the  mathematical  statement  [llq.(3-3)l  of  this  last  fact  is  a  bound  on  the  magnitude  of 
the  increase.  We  shall  now  derive  an  analogous  result  for  the  eas('  of  the  R(dh  Our  notation 
and  assumptions  remain  the  same. 
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Theorem  3.6. 


Let  G,  the  set  of  channel  states,  contain  S  elements,  and  Vet  the  channel  with  or  without 
state  information  be  memoryless  and  time  invariant.  Let  p’  be  the  value  of  p  which  achieves 
the  maximum  required  in  the  definition  of  E^(R).  Then, 


E®(R)  -  p'  In  S  <  E(R)  <  E®(R) 

Proof. 

From  Kq.  (C  -1 ), 

i/i+p 


P(j/k) 


1/1 +p 


^  P(g)  P„(j/k) 


Lj 

geG 


g 


Hence, 


J  [  K 

^  Z  P(k) 

3=1  lk=l 


Z  P<g)  Pg(3/k) 
geG 


geG 

l/l +p) 1 +p 


Z  p<g) 


j=l  LgeG 

=  s'+P  V 


K 

l/l+P  V  /  1  V  /  '  /l  V  V'l+P 
k=l 

K 


g 


1+p 


Z  Z  P(k)  Pg(3/k)^/^'^P 

j=l  Lg€G  k=l 


1+p 


Using  l'>q.(C-2)  on  the  sum  over  G,  we  get 


J 

,1+p  y 
Lj 

3=1 


K 


Z  i  P(g)^^^^'’  Z  P<k)  Pg(3/k)^/^^'’ 

LgeG  k=l 


1+p 


J 

"  K 

1+p 

<S^+P  V 

1 

Z  s'  S) 

Z  Ptl^)  Pg(3/k)^^^‘''^ 

geG 

.k=l 

(3-44) 


(3-45) 


(3-46) 


=  V  p(g) 

gcG 


V 

u 

j=l 


K 

V  p(k)  Pg(3/k)^/^+P 

-  k=  1 


1-fp] 


=  S^F®(p,  pi 

where  we  use  Eq.(3-34)  with  g  replacing  v  in  the  last  step.  Hence, 
E(p,  P)<sPeS(p,p) 

and 


(3-47) 


(3-48) 


I':^(P,P,  R)-plnS<:K(p,p,  R)  . 


(3-49) 


Suppose 
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(3-50) 


K^(R)  =  E®(p',  p',  R) 


Then, 


E^(R)  -  p'  In  S  <  E(p',p',R)  <  E(R) 
Since  0<  p'  <  1,  wc  have  immediately 


(3-51) 


i:^(R)  -  In  S  <  E(R)  (3-52) 

which  bears  a  strong  resemblance  to  the  left  inequality  of  Eq.(3-3).  Equation  (3-51)  contains 
the  left  inequality  in  Eq.(3-44).  The  right  inequality  comes  directly  from  Theorem  3.5. 


It  is  important  to  note  that  p'  is  implicitly  a  function  of  R  and  is  the  value  of  p  which  achieves 
the  maximum  in  l]q.(3-31),  with  g  replacing  v. 

L  SUBCHANNEL  DEPENDENCIES  AND  RCE 

It  now  seems  appropriate  to  remark  that  there  is  no  RCE  counterpart  to  Theorem  2.5.  Sub¬ 
channel  dependencies  may  cither  increase  or  decrease  the  RCE.  An  example  will  illustrate  this 
fact. 


Example  2 

Let  three  state  distributions  q,  r,  and  s  be  given  below; 


q(l,  1)  =  q(2,  2)  =  1/2  q{l,  2)  q(2,  1 )  =  0 

r(l,  1)  =  r{l,  2)  =  r(2,  1)  =  r(2,  2)  =  1/4 

s(l,  2)  =  s(2,  1)  =  1/2  s(l,  1)  -  s(2,  2)  =  0 

Each  state  distribution  leads  to  a  different  2S  channel.  We  note  that  the  channel  corresponding 
to  r  has  independent  subchannels.  It  may  be  verified  that  the  "r  channel”  is  the  dependence- 
removed  channel  derived  from  either  the  q  or  s  channels.  The  input  distribution  which  achieves 
the  maximum  required  by  the  definition  of  the  RCE  is  the  same  for  all  three  cases: 

p(00)  =  p{01)  =  p(10)  =  p(ll)  =  1/4 

This  may  be  verified  by  using  Gallagcr’s  Theorem  4  (Ref.  5).  The  curves  of  ^^{Rg)  vs  R^  for 
the  three  cases  are  given  in  Fig.  6.  Since  the  curve  for  the  independent  subchannels  case  lies 
between  the  other  two,  we  see  that  subchannel  dependencies  may  either  increase  or  decrease 
the  RCE^ 


t  In  foct,  the  s  chonnel  has  o  zero-error  copocity  equol  to  its  copocity  of  1  bit. 
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R  (bits) 

s 

Fig.  6,  E2(R^)  vs  for  three  different  channels. 
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CHAPTER  4 

THE  COMPLETELY  CONSTRAINED  CHANNEL 
A.  DEFINITION  OF  CHANNEL 

An  important  limiting  case  of  the  MS  channel  is  the  class  of  MS  channels  with  the  property 
that  during  the  transmission  of  a  single  input  letter  all  the  subchannel  states  are  the  same.  We 
will  call  such  channels  completely  constrained  (MSCC)  channels.  For  MSCC  channels,  Eq.  (2-7) 
becomes 


p(yi. . . .  . X-V])  =  E 

a  6  A 

where  w^e  will  always  assume  p(o')  >0,  a  e  A.  Because  of  the  complete  dependence  of  the  sub¬ 
channel  states,  the  MSCC  channel  has  some  very  striking  properties;  in  fact,  these  are  proper- 

,  >  oo 

ties  of  sequences  of  MSCC  channels  defined  as  follows: 

(1)  A,  the  set  of  subchannel  states,  is  the  same  for  all  M. 

(2)  p(cv).  O'  c  A  is  the  same  for  all  M. 

(3)  For  the  M^^  channel  in  the  sequence,  p(y^,  .  .  .  ,  i-S 

given  by  Eq.  ( 4-1 ). 

B.  EXAMPLES  OF  MSCC  CHANNELS  AND  THEIR  PROPERTIES 

To  illuminate  the  definition  of  sequences  of  MSCC  channels  and  providt'  specific  examples 
of  their  general  properties,  we  wTll  discuss  tw^o  examples. 

Example  1 

Let  A  =  (l,  2),  p(l)  =  p(2)  =  1/2,  and  Y  =  X  =  (O,  l).  Let  p.(y./x.)  and  p^(y./x.)  be  the 

ss  1.11  ^11 

binary  symmetric  distributions  wTth  crossover  probabilities  equal  to  zero  and  one-half,  respec¬ 
tively.  Then,  the  M^  channel  in  our  sequence  may  be  represented  as  in  Fig.  7. 

a 

pla) 


p  (y/x) 

Q 


Fig.  7.  An  MSCC  channel  —  Example  1  . 


2 

1/2 


1/2 

^  1/2 


1/2 


subchannels 
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Fig.  8.  Capacity  per  subchannel  vs  number  of  subchannels  for  (a)  Example  1 , 
and  (b)  Example  2. 
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1 

0  - 0 


1 
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2 
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3/4 


SUBCHANNELS 


0 

1 


Fig.  9. 
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1 


0 

1 


3/4  J 

An  Msec  channel  —  Example  2. 
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Now,  we  will  introduce  some  further  notation  which  will  be  in  effect  for  the  remainder  of 
this  chapter. 

Associated  w'ith  the  member  of  a  sequence  of  MSCC  channels,  we  have  a  capacity 
and  a  random  coding  exponent  Ejy/j(Rg).  We  define  the  capacity  per  subchannel  by 


C 


C 


sM 


M 

M 


(4-2) 


Wc  introduce  the  convention  that  when  the  channel  state  is  known  at  the  receiver,  the  corre¬ 
sponding  capacity  and  RCE  will  be  represented  by  primed  quantities  [e.g. ,  Ej^(Rg)], 

and  when  it  is  unknown  at  the  receiver,  by  unprimed  quantities. 

Plots  of  C  ,,  and  C  vs  M  for  Example  1  are  found  in  Fig.  8(a). 


Example  2 

Let  A  =  (l,  2},  p{l)  =  p{2)  =  1/2,  and  Y  =  X  =  (D,  l}.  Let  p.(y./x.)  and  p^{y./x.)  be  the 

ss  111  ^11 

binary  symmetric  distributions  with  crossover  probabilities  equal  to  zero  and  one-quarter,  re¬ 
spectively.  Then,  the  channel  in  our  sequence  may  be  represented  as  in  Fig.  9. 

Plots  of  C  ,,  and  C  ,,  vs  M  for  Example  2  are  found  in  Fig.  8(b). 

S  IVi  S IV I 

C.  CAPACITY  THEOREMS  FOR  MSCC  CHANNELS 
Theorem  4,1. 


When  the  channel  state  is  known  at  the  receiver,  the  capacity  per  subchannel  [defined  by 
Eq.  (4-2)]  is  the  same  as  the  capacity  of  a  single  subchannel  standing  alone,  i.e.. 


Proof. 


The  result  follows  directly  from  Theorem  2.7. 


Theorem  4.1  is  illustrated  by  the  two  horizontal  lines  in  Fig.  8(a-b).  However,  an  MS  chan¬ 
nel  need  not  be  MSCC  for  the  theorem  to  hold;  it  holds  whenever  the  individual  subchannel  capac¬ 
ities  (with  state  known  at  the  receiver)  are  all  equal.  This  last  is  certainly  true  if  for  each  a  e  A 
the  probability  that  the  i^^  subchannel  is  in  state  cx  is  independent  of  i. 

Theorem  4.2. 

If 


H  = 


V 


P(q')  lOgp(Q') 


fv  cA 


is  finite,  then 


lim  C  C, 

M-oo  sM  1 


Proof, 

Applying  Theorems  2.3  or  2.4  to  Eq.  (3-3),  we  get 
C  —  H  <  C  <  C’ 


(4-4) 
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Hence, 


r  1  r-  r  1 

^  M  _  ii  .  _M  ^  _M 
A1  M  ^  A1  A1 

Applying  Eqs.(4-2)  and  (4-3),  wc  got 

Hence,  passing  to  the  limit,  we  obtain  Eq.  (4-4)  directly. 

The  curves  of  Fig.  8  illustrate  the  limiting  property  of  which  was  just  proved.  We  note 

again  that  the  channel  need  not  be  AISCC  for  the  theorem  to  hold;  in  fact.  Eq.  (4-4)  may  obtain 
even  if  a  discrete  entropy  H  does  not  exist.  (See  Appendix  E.) 

D.  FURTHER  PROPERTIES  OF  EXAMPLES  1  AND  2 

Just  as  the  capacity  theorems  were  illustrated  by  previously  given  curves,  so  wc  shall  pro¬ 
vide  curves  relating  to  the  RCE's  of  our  examples  to  illustrate  the  theorems  which  are  to  come. 
The  data  on  which  the  curves  are  based  are  as  follows. 

For  Al  =  1,  2,  5,  10,  20,  50,  and  100,  we  computed  Ej^j(Rg)  and  E|^^(R^)  for  equally  spaced 
values  of  R^  from  zero  to  or  The  spacing  was  0.025  bit.  Furthermore,  for  each  such 

computation,  the  value  of  p  which  achieves  the  maximum  required  by  the  definition  of  E^j(Rg)  or 

Ei-(R  )  is  provided  as  output.  The  input  probability  vector  which,  for  the  case  of  A1  subchannels, 
ivi  s  ^ 

achieves  the  requisite  maximum  is  the  probability  vector  with  each  of  its  2  components  equal  to 
( l/2)^^  (sec  Ref.  1  ^). 

In  Figs.  10  through  23,  the  curves  plotted  from  the  data  are: 

Example  1 


Fig. 

10 

vs 

R 

s 

for 

Al  : 

=  1, 

2, 

5, 

10,  20,  50,  100 

Fig. 

11 

vs 

«s 

for 

Al  : 

=  1, 

2, 

5. 

10,  20,  50,  100 

Fig. 

12 

vs 

for 

Al  : 

=  1,  2,  5,  10.  20, 

50, 

Fig. 

13 

vs 

A1 

for 

R 

s 

=  0 

to 

0.3 

in  steps  of  0.025 

bit 

R 

s 

=  0 

.3  t 

o  0, 

,45  in  steps  of  0.05  bit 

Fig. 

14 

vs 

AI 

for 

=  0 

to 

0.3 

in  steps  of  0.025 

bit 

R 

s 

=  0 

.3  to  0, 

,45  in  steps  of  0.05  bit 

Fig.  15  (state  unknown)  Alaximizing  p  vs  A/1  for  R^  =  0  to  0.45  in  steps  of  0.05  bit 
Fig.  16  (state  known)  Maximizing  p  vs  M  for  R^  =  0  to  0.45  in  steps  of  0.05  bit 


t Numbered  references  appear  at  the  end  of  each  chapter. 
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Example  2 
Fig.  17 

Fig.  18 

Fig.  19 

Fig.  20 


E,,(R  )  vs  R  for  M  =  1.  2,  5,  10,  20.  50,  100 

EJ^^(Rg)  vs  Rg  for  M  =  1,  2,  5,  10,  20,  50,  100 

Ej^(Rs)  -  E^l(Rs)  vs  Rs  for  M  =  1,  2,  5,  10,  20,  50,  100 

vs  M  for  Rg  =  0  to  0.2  in  steps  of  0.025  bit 


R  =  0.2  to  0.3  in  steps  of  0.05  bit 
s  ^ 


Fig.  21 


Rg  =  0.3  to  0.5  in  steps  of  0.1  bit 

E’ ,(R  )  vs  IVl  for  R  =  0  to  0.2  in  steps  of  0.025  bit 
s  s 


R^  =  0.2  to  0.3  in  steps  of  0.05  bit 

R^  =:  0.3  to  0.5  in  steps  of  0.1  bit 

Fig.  22  (state  unknown)  Maximizing  p  vs  M  for  R^  =  0  to  0.55  in  steps  of  0.05  bit 

Fig.  23  (state  known)  Maximizing  p  vs  M  for  R^  =  0  to  0.55  in  steps  of  0.05  bit 


We  will  refer  to  these  figures  in  subsequent  sections  of  this  chapter. 


E.  RANDOM  CODING  EXPONENT  (RCE)  FOR  MSCC  CHANNELS 

We  shall  now  undertake  to  prove  a  number  of  general  properties  of  the  RCE's  of  a  sequence 
of  MSCC  channels.  We  begin  with  a  definition. 

Corresponding  to  each  subchannel  state  a,  there  is  a  unique  subchannel  conditional  prob- 
ability  distribution  P^(|A),  I  «  tjc  X^.  This  defines  a  channel  with  a  capacity  C^ .  If  there 
exists  a  c  A  with  p(a)  >  0  and 

C  <  C  all  a  e  A  (4-5) 

a  a 

then  we  say  that  there  exists  a  worst  subchannel  state  a.  (Wc  have,  in  fact,  not  even  assumed 
that  A  is  purely  discrete,  but  only  that  a  has  a  positive  probability,  as  opposed  to  a  positive 
probability  density.) 

Theorem  4.3. 


Suppose  there  exists  a  worst  subchannel  state  a,  with  probability  of  occurrence  p(a).  Then, 

for  all  R  >  C  .  we  have 
s  a' 

Em^Rs^  <  Em<^s^  ^ 

for  all  M. 

Proof. 

Recall  that,  when  the  receiver  knows  the  channel  state, 

exp  [-NE|^(Rg)]  .  (4-7) 
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0.70 


U-ti-Tgl9| 


Fig.  10.  Rondom  coding  exponent  vs  rate  per  subchonnel 
(stote  unknown)  —  Exomple  1 . 


050 
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Fig.  1 1 .  Random  coding  exponent  vs  rote  per  subchannel 
(state  known)  —  Example  1 . 
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0  010  0.20  0.30  0.40  0  50  060 


Rj  (bits) 

Fig.  12.  Difference  between  state  known  and  unknown  random 
coding  exponents  vs  rote  per  subchannel  —  Example  1 . 
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Fig.  13.  Rondam  coding  exponent  vs  number  of  subchonnels 
(state  unknown)  —  Example  1 . 
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Fig.  14.  Random  coding  exponent  vs  number  of  subchannels 
(stote  known)  —  Exomple  1 . 
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Fig.  15.  Maximizing  p  vs  number  af  subchannels 
(state  unknawn)  —  Example  1  . 


Fig.  16.  Maximizing  p  vs  number  af  subchannels 
(state  k  nawn)  —  Example  1 . 
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Fig.  17.  Rondom  coding  exponent  vs  rote  per  Fig.  18,  Rondom  coding  exponent  vs  rote  per 

subchannel  (state  unknown)  —Exomple  2.  subchannel  (state  known)  —  Exomple  2. 
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Fig.  19.  Difference  between  state  known  and  unknown  random 
coding  exponents  vs  rate  per  subchannel  —  Example  2. 
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Fig.  20.  Random  coding  exponent  vs  number  of  subchannels 
(state  unknown)  —  Example  2. 
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Fig.  21.  Random  coding  exponent  vs  number  of  subchannels 
(state  known)  —  Example  2. 
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Fig.  22.  Maximizing  p  vs  number  of  subchannels 
(state  unknown)  —  Example  2. 


Fig.  23.  Maximizing  p  vs  number  of  subchannels 
(state  known)  —  Example  2. 
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This  is  just  Eq.  (3-14)  for  the  state  known  case.  Since  Eq.  (4-7)  is  derived  using  maximum- 
likelihood  decoding  at  the  receiver,  it  is  also  true  for  maximum  a  posteinori  probability  (MAP) 
decoding  when  the  inputs  are  equiprobable.  Recall,  too,  that  is  an  average  probability  of 
error  over  an  ensemble  of  codes. 

For  a  particular  code,  with  block  length  N,  the  probability  of  error  p(e)  satisfies 

p(e)  ^p(e, ^^)  =  p(a^)  p(e/a^")  (4-8) 

N 

where  a  refers  to  N  consecutive  occurrences  of  the  worst  state  a.  Since  our  channel  is 
memoryless, 

p(a^)  =  [p(a)]^  .  (4-9) 

Define 

H(e/a^)  = -p(e/a^)  lnp(e/a^)  -  [1  -  p(e/a^")]  In  [1  -  p(e/a^)]  .  (4-10) 

2 

Then,  letting  W  =  exp[NMR^]  be  the  number  of  (equiprobable)  code  words,  we  have 


H(e/a’^)  +  p(e/a’^)  ln(W-  1)  $^NM(Rg  -  C^) 


Thus, 


and 


p(e/a'^)  > 


NM(R  -  C  )  -  H(e/a^) 
ln(W  -  1) 


NM(Rg  -  C^) 

^  NMR 

s 


In  2 

NMR 

s 


(4-11) 


R 


s 


In  2 
NMR 

s 


<  p(e/a^)  1 


(4-12) 


Since  inequalities  in  Eqs.  (4-8)  and  (4-12)  hold  for  each  code  in  an  ensemble,  they  must  hold  after 
being  averaged  over  the  ensemble  of  codes.  Thus,t  Eqs.  (4-7),  (4-8),  and  (4-9)  become 

[p(a)]^  P^^exp[-NE{^(R^)]  (4-13) 

and  Eq.  (4-12)  becomes 


^  -c  .  ^ 

^  _  -  -1^  ^  <r  p  N  <■  1 
R  NMR  e/a^  ^ 


From  Eq.  (4-13),  we  get 

-lnp<a)  -|j  InPg/aN 

for  all  N.  Passing  to  the  limit  N  ->  «,  we  get 


-lnp(a)-  lim  InP 


N) 


(4-14) 


(4-15) 


(4-16) 


From  Eq.  (4-14),  we  obtain 

1 


lim  (^lnP/N)  =  0 


N, 


tWe  denote  the  ensemble  averages  of  p(e)  and  p(e/a  )  by  and 


(4-17) 
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Thus,  Eqs.(4-16)  and  (4-17)  combine  to  give  us 

^  “Inp(a) 

as  required.  By  Theorem  3.5  (Ej^(R^)  is  denoted  E®(R)  in  the  statement  of  the  theorem),  we 
have  also 

(•1-18) 


E^,(Rs)«E|^^(Rs) 


for  all  M. 


We  note  that  C  =  0  in  Example  1,  and  C  =  0.1887  bit  in  Example  Z,  Hence,  the  bounded 
a  a 

behavior  of  the  RCE’s  is  seen  in  all  the  curves  of  Figs.  13  and  14,  and  in  those  curves  in  Figs.  20 

and  21  for  which  R  >  0.1887  bit. 
s 

It  will  now  be  convenient  to  restate  some  of  the  results  of  Chapter  3  in  MSCC  channel  nota¬ 
tion,  and  to  provide  further  definitions  which  will  be  useful  in  the  sequel.  Follow  ing  Eq,  (3-34), 
we  define 


F|y[(P.  P)  =  E 
O'  cA 


J 

I 

j=i 


K 


E  p(k)  P^(j/k)^/^+P 


-  V 

Lj 

OL  cA 

Then,  we  define 


pia ) 


k  =  l 


V 


1+p 


M 

E  p(^i . n  p„(yiA/^^''’ 


i  =  l 


l+pl 


(4-19) 


EJ^(P,  P,  R,)  =  -  pMR,,  +  e;^(p,  p  ) 


s  oM 


(4-20) 

(4-21) 


and 


P-  R<=> 


M' 


0<p<l 

p  cP 

where,  again,  P  is  the  space  of  all  input  probability  vectors.  If  we  define 


E 

Yi, . . . , 


M 


E  P(^1.  • .  • .  n 


x^,  .  .  . , 


i  =  l 


i+p 


(4-22) 


(4-23) 


then 


V 


^ivi^P’P'-  L, 
a  e\ 


(4-24) 


Finally,  we  define 
E 


OCK  M 


(p,p)=  -lnF^j^^(p.  p)  . 


(4-25) 


50 


If  M  =  1,  p  becomes  (a  subchannel  input  probability  vector),  and  we  shall  generally  drop  the 
M  subscript  on  E  and  F  Thus,  E  (p,  p  )  =  E  .(p,  P  ),  •  and  F  (p,p)=F  >{p,  p  )  are 

quantities  relating  to  a  single  subchannel. 

Theorem  4.4. 


Suppose  there  is  a  worst  subchannel  state  a,  and  >  C^.  Let  pj^  be  the  value  of  p  which 
achieves  the  maximum  required  by  the  definition  of  [Eq.(4-22)].  Then, 

lim  Pj^  =  0  (4-26) 

M-*“oo 

and 

lim  (p'  M  =  0  .  (4-27) 

M-»«o  ™ 

Proof. 

From  Eqs.  (4-6),  (4-21),  and  (4-22), 

-pMRg  +  P  '  ^  -lnp(a)  (4-28) 

for  all  R^  >  C^,  0  <  p  <  1,  p  e  P.  Thus, 

^oM^P’  p)  <  -lnp(a)  +  pMC^  (4-29) 

for  all  0  ^  p  ^  1,  p  e  P.  Equations  (4-28)  and  (4-29)  combine  to  give 

-pMRg  +  P)  <  -lnp(a)  +  pM(C^  -  R^)  (4-30) 

for  all  0<  p.^  1,  p  e  P,  and  R^  >  C^.  Since  is  non-negative,  Eqs.  (4-30),  (4-21),  and 

(4-22)  combine  to  give 

0  <  Ej^(Rg)  <  -lnp(a)  +  pjv^M(C^  -  R^) 


and  thus  we  obtain 


0  <  o'  <  -lnp(a) 

^  ^  M(R  -  C  ) 

s  a 


B'rom  Eq.  (4-31),  we  get  Eqs.  (4-26)  and  (4-27)  directly. 


(4-31) 


We  note  that  the  proof  could  just  as  well  be  carried  through  if  pj^  were  the  value  of  p  which 
achieves  the  maximum  required  by  the  definition  of 

The  behavior  of  pj^^  just  proved  is  illustrated  in  the  curves  of  Figs.  15  and  16,  and  in  those 
curves  of  Figs.  22  and  23  for  which  R^  >  0.1887  bit.  Since  the  slopes  of  these  curves  are  all 
minus  one  for  large  M,  they  suggest  that  indeed  is  equal  to  a  constant  independent  of  M 

for  M  sufficiently  large.  However,  the  constant  is  smaller  than  that  suggested  by  the  rightmost 
expression  in  Eq.  (4-31). 

Although  the  assertions  of  the  theorem  just’ proved  are  technical  in  the  sense  that  they  are 
not  subject  to  immediate  physical  interpretation,  their  consequences  are  quite  striking.  One 
such  consequence  is  given  by  the  following  theorem. 
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Theorem  4.5. 


If  there  exists  a  worst  subchannel  state  a,  R  >  C  ,  and  A  is  finite,  then, 

s  a* 

lim  [E^^(R  )-E  (R  )]  =  0  .  (4-32) 

Proof . 

The  result  follows  directly  from  Eq.  (4-26)  and  Theorem  3.6. 


The  result  of  Theorem  4.5  is  illustrated  in  Figs.  12  and  19.  For  the  examples  computed, 

Eq.  (4-32)  appears  to  hold  at  all  rates.  Note  that  the  difference  in  RCE's  is  not  monotone  in  M. 

Since  the  difference  E|^(R^)  —  E^(R^)  approaches  zero  with  increasing  M,  under  the  con¬ 
ditions  stated,  it  is  natural  to  ask  whether  either  term  (and  hence  both  terms)  approaches  a  limit 
under  similar  conditions.  This  question  will  be  answered  in  the  affirmative  after  some  labor. 

First,  it  will  be  necessary  to  study  the  properties  of  the  input  probability  vector  pj^,  which 
achieves  the  maximum  in  Eq.  (4-22).  Now,  from  Eqs.(4-20),  (4-21),  and  (4-22),  we  have 

E!^^(Rg)  =  max  [-pMR^  -In  p  ] 

O^p^l 

peP 

=  max  [— pMR^  — In  min  F'  (p,  p  )]  .  (4-33) 

^  pcP  ^ 

Thus,  we  shall  be  concerned  with  the  properties  of  probability  vectors  (distributions)  which 
minimize  Fj^(p,  p  ). 

Definition. 

M 

A  distribution  p  over  a  product  space  V  is  said  to  have  permutational  symmetry  if 

p(v,,  .  .  .  ,  v.J  =  p(v  ,  .  .  .  ,  V  )  (4-34) 

^  J]\/i 

r  T  A1 

for  all  permutations  oi  the  integers  from  1  to  M. 

Theorem  4.6. 

For  an  MSCC  channel,  the  min  F^(p,  p)  may,  for  any  p,  0  ^  p  l,  be  achieved  with  an  input 

P 

distribution  having  permutational  symmetry. 

Proof. 

Let  * 

FJ^^(p,p^:0  =  min  F|^^(p,p)  (4-35) 

where  p'*'  is  implicitly  a  function  of  p  and  Al,  and  we  may  write 

p'' =  . • 

Define 


r. 

P  (x. 


Ai: 


V 

PT 


p  'I'  ( X .  ,  .  .  .  ,  X .  ) 

V  Ji  JaW 


(4-36) 
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where  2  denotes  the  sum  over  all  MI  possible  permutations  of  the  integers  from  1  to  M.  P^or 
PT 

any  permutation  input  distribution  p(x^,  .  .  .  ,  we  have 


E 


. . 


M 

y  p(x.  ^  n  p  (y./x.)^ 

^  \  Ji  J]V1/  ^  ^  ^ 

...  X,,  i=l 


/1+P 


1+p 


M 


s  ■’(’‘i, . “ij  n  p„h  A) 

...  X.,  ^  i=l 


1/1+p 


Lx^,  .  .  .  ,  x^^ 


1+p 


S' 

Lj 


y 


V 

u 


.If  J]VI  Ji  Jjvi 


M 

. \)  n  p.^A'i) 

i=l 


i/i+p 


1+p 


M 

E  p'A . n 

yi . yyi '■^1 . 


i/i+p 


1+p 


(4-37) 


Since  F  „,(p,  p  )  is  a  convex  downward  function^  of  p, 


T  -*-r  1  ^ 

PT  yi....,y^i 


V 


xp  '  •  •  » 


M 

p"*(x.  ,  .  .  .  ,  X.  )  n  p  (y./x.)^ 
i  =  l 


/i+p 


1+p 


yi . yM 


M 


E  P='<^1 . n  Pa'y/''i'^'^^^^ 


X^,  .  .  .,  X^^ 


i=l 


1+p 


p'  i\  Ap,  P'-'  )  all  O'  e  A 

fv  M 


(4-38) 


where  we  use  Eq.  (4-37)  with  p  =  p'^'.  From  Eqs.  (4-38)  and  (4-24), 


FJv,(p,pM<F’,(p,p^O 


M' 


(4-39) 


But,  by  Eq.  (4-33), 


Fj^(p,  p‘M  <  Fj^,j(p,  p  ) 


(4-40) 


Hence, 


F]^(p,  P  )  =FJ^(p,p=:')  =  min  Fj^j(p,  p  ) 

peP 


(4-41) 


as  required. 


We  note  that  the  essential  property  of  the  MSCC  channel  which  allows  us  to  prove  the  result 
is  that  the  subchannel  state  distribution  p(q'^,  •  •  •  »  permutational  symmetry.  This  may 

be  demonstrated  by  a  minor  modification  of  the  proof  of  Theorem  4.6. 

t  See  Ref.  1,  Theorem  4. 


53 


Let  be  the  space  of  all  input  probability  vectors  with  permutational  symmetry.  We  have 


shown  that  for  an  MSCC  channel, 

min  FJ^(p,  p  )  =  min  FJ^(p,  p) 
pcP  peP 


(4-42) 


for  all  p,  0  <  p  <  1 . 

Definition. 


If  p(t])  is  a  probability  distribution  on  X^,  and 

M 


p<^i . =  n  p(Xi) 

i=l 


(4-43) 

then  we  say  p(x^,  .  .  .  ,  is  a  product  distribution.  Let  D  be  the  set  of  all  product  distributions.t 
Clearly,  D  Cl  P^.  If  p^  is  the  probability  vector  corresponding  to  p(7^),  p  is  the  probability  vector 
corresponding  to  p(x^,  .  .  .  ,  and  Eq.  (4-43)  holds,  then  we  shall  write 

vM 


P  =(Ps) 


(4-44) 


We  shall  also  write  p  €  D.  Finally,  we  shall  denote  the  set  of  subchannel  inppjt  probability  vectors 
by  U. 

We  shall  now  examine  the  properties  of  functions  from  which  Ej^^(R)  is  derived  if  p  c  D.  From 
Eqs.  (4-23)  and  (4-43), 

M 


(4-45) 


Lm<P'P’  = 

E 

i+p 

iyi 

Lx^  J 

=  (F^(p,p  J] 


M 


Henceforth,  we  shall  assume  that  =  {l,  .  .  .  ,  L),  and  =  {l,  .  .  .  ,  Q).  Thus,  we  have 


Q 


F^(p,?3)=  E 

q  =  l  Ljf=:l 

Using  Eqs.  (4-Z4),  (4-45),  and  (4-Z5),  we  have 


L 

E  p(i)p  (q/i)^/^^'’ 


1+p 


(4-46) 


M 


Fjy[<P.  p)=  E  P(a'MF^(p,  Pj,)] 
a  cA 


E  p(a)  exp[-ME^(p,  Pg)] 
a  cA 


for  all  p  £  D  and  all  p,  0  p  ^  1.  More  generally,  Eqs.  (4-Z4)  and  (4-Z5)  yield 

Tj  P*"'  """Pt-Eo^ivifp.P)] 


(4-47) 


(4-48) 


O'  cA 


for  all  p  £  P  and  all  p,  0  <  p  <  1. 


t  Nate  that  this  is  a  mare  restrictive  definitian  than  in  Chapter  2,  because  here  we  ask  that  all  the  individual 
subchannel  marginal  distributions  be  the  same. 
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Because  of  the  functional  form  of  E  (p,  p  )  and  the  fact  that  the  sums  used  in  its  definition 

OCX  ^s 

are  finite,  all  derivatives  of  E^^(p,  p^)  with  respect  to  p  are  continuous  with  respect  to  p  and 
the  L  +  1  probability  vectors  involved  in  the  definition  of  E^^  (one  subchannel  input  probability 
vector  and  L  subchannel  output  probability  vectors,  each  conditioned  on  one  input).  Hence,  by 
an  argument  identical  to  that  used  in  the  proof  of  part  (1)  of  Theorem  2.3,  there  exists  a  positive 
constant  B(L,  Q)  for  which 


d^E 


dp 


<  B(L,Q) 


(4-49) 


for  all  p,  0  <  p  <  1,  all  subchannel  input  distributions  p(jf),  and  all  conditional  probability  dis¬ 
tributions  p  (q/f). 


Theorem  4.7.  (Gallager) 

Consider  a  channel  with  X  =  (l,  .  .  .  ,  K),  Y  =  (l,  .  .  .  ,  J},  and  transition  probabilities  Pq,(3A)> 
1  <  j  <  J,  1  <  k  <  K.  Let  p  =  [p(l),  .  .  .  ,p(K)]  be  an  input  probability  vector,  and  assume  that  the 
average  mutual  information 


K  J 

^  p(k)  p^(j/k)  In  -  (4-50) 

k=l  3  =  1  2  p(i)  Pq,(3A) 

i  =  l 

is  nonzero.  Define 


J 

■  K 

1+p 

LoM<P'P)  =-ln 

E 

3  =  1 

.k=l 

Then,  for  p  ^  0, 


Lom(pT)>0 

for  p  >  0 

for  p  >  0 

dp 

(4-52) 

(4-53) 

(4-54) 


3p 


P  ' 


p=0 


(4-55) 


^^^ooM<P-P) 

2 

ap 

with  equality  in  Eq.  (4-56)  if  and  only  if  both  of  the  following  conditions  are  satisfied: 

(1)  p^(j/k)  is  independent  of  k  for  j,  k  such  that  p(k)  p^(]/k)  ^  0. 

(2)  Yj  independent  of  j. 

k:p^(3/k)^0 


(4-56) 
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This  is  Theorem  2  of  Gallager,t  so  the  proof  will  not  be  given  here.  If  p)  =  0,  input  and 

output  are  independent?  Thus,  p^(j/k)  =  P^(j)  p(k)  ^  0  (1  <  j  <  J).  Then,  using  Eq.  (4-51), 

we  have  E  y.Ap»  p  )  =  0  for  all  p  >  0. 
on  ivi 

Note  that  if  the  channel  referred  to  in  Theorem  4.7  is  MSCC,  the  definition  of  E  by 

'  on  M 

Eq.  (4-51)  is  consistent  with  the  definition  of  E  by  Eqs.  (4-23)  and  (4-25).  Note,  too,  that 
all  the  results  of  Theorem  4.7  apply  to  a  single  subchannel  as  well  as  to  the  whole  channel.  In 
our  notation,  this  means  that  the  results  hold  if  all  M’s  are  deleted  and  we  make  the  following 
changes; 

j  q 

k  -  f 
J  --  Q 
K  -  L 


For  p  f  U,  define! 


and 


EN(t,  Pg,  Rg)  =  -  tRg -In  Yj  P(a)  exp[-tl^(pg)] 

a  f  A 


EN(R  )=  1.  u.  b.  max  EN(t,  p  ,R  ) 

S  ^  TT  ^  ^ 

0^t<°o  p  eU 
^  s 


(4-57) 


(4-58) 


Theorem  4.8. 


'  (a)  Suppose  there  exists  a  worst  subchannel  state  a,  and  C  <  R  <  C'  Then,  there  exists 
r  I-  *  a  s  1 


a  positive  number  t^  with 


EN(R  )  =  max  EN(t  ,  p  ,  R  ) 

S  - — ►  T  T  O  S  S 

p  cU 


(4-59) 


(b)  If,  in  addition,  there  exists  a  single-subchannel  probability  vector  p^  with 


C  =  I  (  p  )  all  n  €  A 
n  n  ^s 


(4-60) 


then. 


EN(R  )  =  — t  R  —In  p(o')  exp[— t  C  ] 

s  os  u  ^  ^^oo^ 

O'  f  A 


(4-61) 


t  See  Ref.  1 ,  p.  6. 

INate  that  far  each  ^  and  R  ,  I  p(a)  exp(t[R  —  1  (~p)]}  is  the  mament  generating  function  g(t,p  ,R) 
s  s  s  a  s  s  s 

associated  with  the  random  variable  —  I  (^g)*  Since  by  Eq.  (4-57)  EN(t,'p^,Rp  =—  lng(t,'p  ,R  ),  same  af  the 

properties  af  EN(t,’p  , R^  which  we  shall  Jerive  may  be  obtained  from  the  theory  of  moment  generating  functions. 

See,  far  example.  Chapter  8  of  Ref .  2. 
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where  is  given  implieitly  and  uniquely  by 

2  p(q'  )  C  exp  [— t  C  ] 
f,  ^  a  ^  ^  o  O!  ^ 


2  p  q;)  exp[~t  C  1 

.  ^  ^  ^  O  O'  ^ 

O'  eA 


Proof. 

(a)  All  that  really  needs  to  be  proved  is  that 

EN(R  )  7^  Urn  sup  max  EN(t,  p  .  R  ) 
®  t-*<»  p  eU  ®  ® 


From  Eq.  (4-57), 


EN(E  Pg.  Rg)  ^  -t  [Rg  -  y  p^)]  -lnp(a) 


^  “t(Rg  -  C^)  -Inpla) 


for  all  p  e  U.  If  we  define 


In  p(a) 

R  -  C 
s  a 


t  >  t'-'  implies 


EN(t,  Pg,  Rg)  <0  =  EN(o,  pg,  Rg) 


for  all  p^  €  U.  Thus,  we  have  proven  Eq.  (4-63). 
(b)  Using  the  definition  of  C^,  we  obtain 


p  eU 


-tRs-ln  Yj  P(«>  exp[-tl^(  Pg)] 
a  cA 


^-tRs'ln  Y  P(a)  f“xp[-tC^] 

a  eA 


for  all  t  >  0,  w'ith  equality  if 


I  (  p  )  =  C 

O'  ^  S  O' 


all  fv  €  A  .  [Eq.(4-60)] 

Thus,  if  Eq.  (4-60)  holds,  we  have  from  Eqs.  (4-57)  and  (4-58)  that 


EN(R  )  =  1.  u.  b. 
®  0<t<« 


-tRg-ln  Y  P(»)  exp[-tC^] 
O'  eA 


=  -ln(g.  l.b.  I  Y  P(«)  exp[t(Rg  -  C  )] 

\  I  Q!  eA  J 


Let 


(pit)  =  Y  P(«)  exp  [t(Rg  -  C^)] 
a  eA 

Then, 

=  ^  =  E  P(«) 

a  eA 


(4-62) 


(4-63) 


(4-64) 


(4-65) 


(4-66) 


(4-67) 


(4-68) 


(4-69) 


(4-70) 
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and 


dt 


(4-71) 


From  Eq.  (4-70), 


=  E  p(a)(Rg-C^)^exp[t(Rg-C^)] 
a  eA 


<P'{0)  =  Rg  -  E  P(«> 

a  €A 

=  R  -  C’  <  0  .  (4-72) 

si 

From  Eq.  (4-71), 

<;^"(t)  >  0  (4-73) 

for  all  t  ^  0.  Thus,  <^(t)  is  strictly  convex  downward  and  must,  by  Eqs.  (4-72),  (4-68),  and 
part  (a)  of  this  theorem,  have  its  g.  1.  b.  at  its  stationary  point.  Thus,  setting  ^’(t^)  to  zero, 
wc  obtain  Eq.  (4-62). 


For  p  c  D,  define 


E^^(p,  P,  Rg)  = -pMRg  -  In  E  P(«)  exp[-pMI^(  p^)] 

O'  €A 


(4-74) 


where  p  =  ( 


and 


0<p^l 

peD 


Theorem  4.9. 


Suppose  there  exists  a  worst  subchannel  state  a,  and  C  <  R  <  C’  Then,  if  t  is  defined 

a  s  1  o 

by  Eq.  (4-59),  and  p(M)  is  the  value  of  p  which  achieves  the  maximum  in  Eq.  (4-75),  we  have 
for  M  t 

o 

Ej^(Rg)  =  EN(Rg)  (4-76) 

and 

p(M)  =  t^/JVI  .  (4-77) 

Furthermore, 

Urn  E,.(R  )  =  EN(R  )  .  (4-78) 

M->«>  ^  ® 

Proof. 

A  comparison  of  Eqs.  (4-74)  and  (4-75)  with  Eqs.  (4-57)  and  (4-58)  makes  Eqs.  (4-76)  and 
(4-77)  obvious  consequences  of  Theorem  4.8.  Equation  (4-78)  is  a  consequence  of  Eq.(4-76). 
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Define 


E'^ip.p.R  ) 

0^P<1 

pc  D 


=  max  [-pMR  -InF'  (p,p)] 

0^p<l  ® 

pcD 

=  max  [— pMR^— Inmin  F’  (p,p)l  .  (4-79) 

0<p^l  ®  peD 

This  definition  differs  from  that  of  only  in  that  the  maximization  over  input  probability 

vectors  is  over  D  rather  than  P. 

Note  that  Eq.  (4-33)  and  Theorem  4.6  imply 

E'  (R  )  =  max  [-pMR  -  In  F'  (p,  p  )] 

'  '  ®  0^p<l  ® 


=  max  [-  pMH  —  In  min  PIt(p,  p)]  (4-80) 

0<p«'l  ®  pcP^ 


Equations  (4-74)  and  (4-75)  imply 


E.-(R  )  =  max 

®  0<p^l 

p  eU 


-pMR^-ln  ^ 
acA 


p(«)  exp[-pMI^(  Pg)] 


(4-81) 


Theorem  4.10. 


(a)  If  there  exists  a  worst  subchannel  state  a,  and  C  "  R  C'  then 

’  a  s  1 

11m  E’  (R^)  =  lim  E  (R  )  =  EN(R  )  .  (4-82) 

Mlvl  o  TV  /I  1 V 1  o  o 

-*oo 

(b)  Suppose  there  exists  a  single  subchannel  input  probability  vector  p^  satisfying  ICq.  (4-60). 
Associate  p(i)  with  p^  and  assume  that  the  subchannel  conditional  probability  distributions  p^  (q/O 
satisfy 

is  independent  of  f  (4-83) 

for  q,  ^  with  p(f)  p^(q/f)  ^  0  and  all  cv  c  A .  Assume,  too,  that 


2^  p(f)  is  independent  of  q 

l:p^{q/l)¥^0 


(4-84) 


for  all  cv  c  A.  Then,  there  exists  a  positive  number  t^  defined  by  Eq.  (4-62)  such  that  M  ^ 
implies 


=  i™ 

T-^oo 


(4-85) 
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Proof. 


(a)  From  Tayloi-'s  theorem  and  Theorem  4.7,  wv  have  for  any  0  1.  pi  P.  and  p  t  l\ 


P>  ^  - 


^  '■'''■-oo.m'"-''’ 


w hvrv  0  tip)  p  <  1 .  and 


"Z" 


dp 


tip) 


-  pVm(  p' 


E 


Z  ()  T]  (p.  p  ) 
p  OO'  '  ‘  s 


=PI„(P,,)  +  V  - 7T 


^P 


lip) 


\\her('  0  t(p)  ■  p  ^  1  From  T^q.(4-49)  and  Th(‘or('m  4.7.  we  get 

d^K  (p,  p  ) 

m\  .Q)<  — ^  ^  • 

dp 

•Thus,  from  ]]qs.(4-48)  aiui  (4-86), 

\' 


P\j(p.  P  ) 


p(o)  exp  P)] 


O'  (  A 


(4-86) 


(4-87) 


(4-88) 


(9-89) 


From  Eq.  (4-80).  for  purposes  of  minimizing  )i  ^'I’lay  assume  p  i  .  Making  tliis 

assumption.  \\v  define 


p^(x^)  =  ^  p(x^ . x^|)  (4-90) 

X. 

1 

Since  i)  c  P  .  we  hav(' 

1  j..' 

p.(x.)  =  p(x.)  all  i.  1  V.  i^  M  (4-91) 


Define' 


M 

. W[’  ^  n  p(>^i) 

i=l 


(4-9Z) 


and  associate  Pj^^  witl'i  •  •  •  »  l^x^).  Then,  by  the  remark  following 

Eq.  (Z-Z9). 


O'  s 

By  ]lqs.(4-89)  and  (4-93). 


(-1-93) 


lU  ^  ^  p(o’)  (.'xp[-pMI^^(  Pj,)|  (4-94) 

(v  i  A 

for  all  p,  0<  p  1.  and  all  p*  e  P^.  From  Eqs.  (4-79).  (4-80),  and  Theorem  Z.3,  we  get  the 
left  Inequality  below: 

F^,(Rg)<  E|^j(R^)^  E^j(R^)  .  (4-95) 

The  right  inequality  is  obtained  from  Eqs.  (4-80),  (4-81),  (4-94),  and  Theoi*em  Z.3. 
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(4-96) 


Let  p,  be  such  as  to  achieve  the  maximum  in  Eq.  (4-75).  From  Eq.  (4-95), 


o<Em(Rs)-e;^(Rs)<e^,j(r^)-e^j(r^)  .  • 


From  Eq.  (4-79), 


E,,(Rs)  -  S^,(R^)  <  E^(R^)  -  EJ^  [  p,  (  p;)^,  R 


P"rom  Eqs.  ( 4-20),  (4-21),  (4-47),  and  (4-87), 
Ejvi  [P.  (  Pg)^',  Rgl  =  -pMRg  -  In  ^  p(ff )  exp 


aeA 


2  0  E  ip,  p  ) 

-pMi  (?  )-e3  ■ — 

^  O'  2 


for  all  p,  0  <  p  <  1,  and  €  U.  Using  Eq.  (4-88),  we  have 


Yj  p(«)  exp[-p  MI^(  p^)] 
a  eA 


dp  4(P) 


+  B(  L,  Q) 


Thus,  using  Eqs.(4-81)  and  (4-77),  we  get 

,M 


^m(Rs>  -  Ejvi  [  P.  (  Rgl  <  B(  L,  Q)  =  ^  B(  L,  Q) 

Combining  Eqs.(4-96),  (4-97),  and  (4-100),  we  get 


0<  E^(R^)  -  Ey  R^)  <  Ej^j(R^)  -  E^(R^)  <  ^  B(  L.  Q) 


Thus, 

Urn  [E^(R^)-E;^(R^)]=  Urn  (  Ej^(  R  )  -  E^(  R  )]  =  0  . 

From  Eqs.(4-78)  and  (4-102),  we  get  Eq.(4-82),  as  required. 

(b)  Equations  (4-67),  (4-74),  and  (4-75)  imply  that  for  p^  satisfying  Eq.  (4-60), 

^M<Rs>  =  • 

0^P<1 

Equations  (4-83),  (4-84),  and  Theorem  4.7  imply 

a^E  (p,p  ) 

on'  ^s  _  PI 

Op 

for  all  p.  0  p  1.  From  Eqs.  (4-74),  (4-98),  and  (4-104),  for  p^  satisfying  Eq.  (4-60) 
p.  0  p  1,  we  have 

[P.  (  Ps>^-  Rgl  =  Rm  fp’  <  Ps'^'  Rgl  • 
nonce,  by  Dqs.  (4-95)  and  (4-lOJ). 

Thus,  for  M  >  Eqs.  (4-76),  (4-78),  and  (4-106)  give  the  result. 


(4-97) 

(4-98) 

(4-99) 

(4-100) 

(4-101) 

(4-102) 

(4-103) 

(4-104) 

and  all 

(4-105) 

(4-106) 
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The  curves  of  Fig.  14  illustrate  both  parts  of  Theoreiri  4.10.  In  this  ease,  the  p  of  part  (b) 

-  ►  1  .  ^ 
is  given  by  p^  =  (a  »  ab  Tlie  curves  of  l-’ig.  21,  corresponding  to  Rg  >  0.1887  bit,  Illustrate  part  (a) 

of  the  theorem.  Part  (b)  of  Theoreni  4.8  applies  to  both  examples,  with  p^  =  .  1). 

Corollary  1. 

Suppose  A  is  finite.  Fnder  the  assumptions  of  part  (a)  of  Theorem  4.10. 

.Urn  E,,(R^)  =  EX{R  )  .  (4-107) 

Proof. 


Combine  Theorems  4.10  and  4.5. 


The  curves  of  Figs.  13  and  20,  corresponding  to  R^  >  0,1887  l)it,  Illustrate  tlie  c'oi'ollary. 
Again,  part  (b)  of  Theorem  4.8  applies  to  both  examples  with  p^  -  ,  i). 

Theorem  4.11. 

If  for  each  p,  0  <  p  1,  there  exists  a  single  p^  c  V  with 

min  F  (p,  p  )  =F  (p,  p’  )  (4-108) 

p  cL 
^  s 

for  all  O'  e  A,  then 

for  all  M. 

Proof. 


Theorem  5  of  Gallager  (see  Ref.  1,  p.lO)  and  our  Eqs.  (4-20)  and  (4-45)  imply 

_  „  „  _  IV’\i<P.Pl=  .min  i'ViP'Pci 

peP  pcD 


min  F  (p,  p  ) 
Tt  O'  ‘  s 
PsCl. 


Equations  ( 4- 1  08),  (4-44),  (4-45),  and  (4-110)  imply 


peP 


oM 


for  all  o  c  A.  Thus,  from  Eqs.  (4-24)  and  (4-111), 


(4-110) 


(4-111) 


min  F|y^j(p,  p)  =  [p,  (  Pg)^'  ] 

peP 

=  min  P  ) 

pe  D 


(4-112) 


and  from  E]qs.(4-33),  (4-79),  and  (4-112)  we  have  our  result. 


For  both  Examples  1  and  2,  Eq.  (4-108)  is  satisfied  for  all  p,  0  p  ^  1 ,  if  p^  =  {\  ,  }).  Thus, 
Eq.  (4-109)  holds  for  our  examples. 

One  might  wonder  if  Eq.  (4-109)  holds  for  all  AISCC  channels.  The  answer,  although  far 
from  obvious,  is  that  it  docs  not.  An  example  which  demonstrates  tliis  fact  is  discussed  in 
Appendix  F. 
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Thus  far,  we  have  considered  only  the  case  where  there  exists  a  worst  subchannel  state  a, 

and  C  <R  <  C’  Now,  if  R  >  C’ 
a  s  1  si 

Em'Rs*  ■  ■  " 

for  all  M,  by  the  converse  to  the  coding  theorem.  It  remains  to  investigate  the  behavior  of 

E'  (R  )  when  R  <C  and  R  <  C’ 

M  s  s  a  si 


Theorem  4.12. 

If  there  exists  a  subchannel  input  distribution  and  a  positive  number  I,  for  which 

I  (p  )  >I  >  R  all  fY  £  A  (4-113) 

cv  s  s 

then, 


E’  (R  )  -^  oo  as  M  -*  oo 

M  s 

If,  in  addition,  A  is  finite,  then 

E,«(R  )  -^  oo  as  M  oo 

IVl  s 


(4-114) 


(4-115) 


Proof. 

By  Eqs.  (4-87)  and  (4-88)  for  p^  satisfying  Eq.  (4-113),  all  p  >0  and  all  a  e  A,  we  have 


E 

oa 


<P,Ps. 


pi  ( 

^  ru  ^  G 


T 


dp^ 


I(P) 


2 

>pl  -  ^  R(L,  Q) 


(4-116) 


Hence,  by  Eqs.  (4-ZO),  (4-Zl),  (4-47),  and  (4-116), 


E;^^[P,  (  Rj  >-pMRg  -  In 


V 


p{a)  exp 


2 

-pMI  +  B(I  ,  Q) 


O'  c  A 


pIm 


>pM(I  -  Rg)  -  B(L.  Q) 


Let 


p'-  =  m  in 


I  -  R 


B(  L,  Q) 


Then, 


Clearly, 


'.(Pg) 


M 


o  1  ^  M 

R  1  >  min 


(I-R  )‘ 

"  -V'  -§(77%- 


M 

niin 


(I-Rs), 


B(L,  Q) 


—  oo  as  M  ^  oo 


(4-117) 


(4-118) 


Thus,  Eqs.  (4-2Z),  (4-117),  and  (4-118)  imply  that  Eq.  (4- 1 14)  holds.  If  A  is  finite,  Eqs.(3-5Z) 
and  (4-114)  combine  to  yield  Eq.  (4-115). 
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The  curves  of  Figs.  20  and  21,  corresponding  to  <  0.1887  =  1,  illustrate  the  theorem 

(Pg  =(i,i)l. 


Theorem  4.13, 

Let  A  be  finite.  If  for  each  subchannel  input  distribution  p^  there  exists  /3  c  A  with 


(4-119) 


then. 


(4-120) 


for  all  M,  wheret 


d  =  max  (— Inp(Q'))  <  <» 
aeA 


(4-121) 


Proof. 


Let 


where  p’  is  chosen  to  have  permutation  symmetry.  Using  Eqs.  (4-33)  and  (4-89),  wc  have 


(4-122) 


O'  eA 


If  p  is  the  single-subchannel  marginal  distribution  corresponding  to  p’,  Eqs.  (4-122)  and  (4-93) 
imply 


S  P(«)  exp[-p'MI^(pg)] 


a  €  A 

Then,  for  /3  satisfying  Eq.  (4-119)  for  this  particular  p^, 

EJyi(Rs)  <  -p'M  [Rg  -  I^(  p^)]  -lnp(^)  <  -lnp(/3) 


(4-123) 


Thus,  using  Eq.  (4-121),  we  have 

independently  of  p^,  and  hence  independently  of  M.  The  remainder  of  Eq.  (4-120)  is  provided  by 
Theorem  3.5. 

Theorem  4.13  extends  the  conditions  under  which  the  conclusion  of  Theorem  4.3  holds  [with 
substitution  of  d  for  — Inp(a)].  One  would  expect  that  a  similar  extension  is  possible  for  Theo¬ 
rems  4.4,  4.5,  4.8,  4.9,  and  4.10.  This  is  indeed  the  case.  Of  course,  some  modification  of 
the  proofs  of  these  theorems  is  required. 

We  shall  close  this  chapter  with  a  result  on  monotonicity. 


t  Recall  that  p(a)  >  0  for  al  I  a  e  A . 
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Theorem  4,14. 


EM(Rg)  is  a  monotone  nondecreasing  function  of  M. 

Proof. 

First,  note  that  Eq.  (4-56)  implies  that  E^^(p,  p^)  is  a  convex  upward  function  of  p  for  each 
Pg  e  U  and  a  €  A.  By  definition  of  convexity, 


M  .  ,  1  ^  \  ^  IT  /  M  —  . 

M  +  1  oo  M  +  1  oo  ^s  ^  oa  M  +  1  ^s 


By  FJq.  (4-52),  this  becomes 


ME  (p,p)^<M  +  l)E  (-£^,p) 
OO?  ^s  OQ?  M  +  1  *  ^s 


(4-124) 


(4-125) 


for  all  p,  0  <  p  ^  1,  a  c  A,  and  p^  e  U.  From  Eqs.  (4-20),  (4-21),  and  (4-47),  we  have 


Ejvi  [p,(Pg)^\  Hg]  =-pMR^-ln  ^  p(a )  exp  [-ME^^  (p,  p  J]  .  (4-126) 


ocv  ^  s 


O'  cA 


Define  p',  p'^  by 


Em<Rs>=Em 


Then, 


Ejy,(Rg)  = -p'MR^  -  In  Yj  P(»>  exp  [-ME^^(p',  p^)] 


a  eA 


and 


(4-127) 


(4-128) 


+  2  p(a)exp[-(M4l)E^^^(j^,?')]  .(4-129) 

O'  cA 

Since  O^p’^  1,  0<p’M/(M  +  1)4  1  also,  and  by  Eqs.  (4-125),  (4-128),  and  (4-129), 

^M  +  1  t  M  +  1  '  ^M  +  1*^’ P- ^  ^M+l*^s' 

0<p<l 

peD 


Note  that  if  E’  (R  )  =  E,-(R  ),  the  monotonicity  above  carries  over  to  Elv/,(R  ).  It  is  not 
iVl  S  Ivl  S  iVl  S 

known  whether  E’  (R  )  is  always  monotone  for  MSCC  channels.  I  would  conjecture  that  the  answer 
ivl  s 

is  in  the  negative.  However,  since 
3^E  , 

ap 

— 

for  0  4  p  4  1  and  p  defined  on  ,  we  may  derive  in  a  manner  analogous  to  the  derivation  of 
Eq.  (4-125) 

E  ,  (p,  p  )  4  iE  ,  (p/f ,  p  ) 
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for  i  an  integer.  Thus,  again  proceeding  as  in  Theorem  4.14,  we  get 


and 


whenever  M  =  ki. 

For  our  examples,  E,-(R  )  =  E'  (R  ).  Thus,  the  monotone  behavior  of  their  RCE‘s  may  be 
^  M  s  M  s 

observed  in  Figs.  11  and  18. 
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CHAPTER  5 

SYSTEMATIC  CODING  FOR  COMPLETELY  CONSTRAINED  CHANNELS 


A.  INTRODUCTION 

Our  study  of  coding  for  parallel  channels  has  thus  far  been  confined  to  an  exploration  of  the 
properties  of  the  applicable  random  coding  exponent  (RCE).  This  exponent  presupposes  maximum- 
likelihood  decoding,  as  has  been  previously  stated.  For  any  given  code,  with  code  words  used 
equiprobably,  maximum-likelihood  decoding  yields  the  minimum  probability  of  error.  Unfor¬ 
tunately,  the  amount  of  computational  effort  required  to  perform  maximum-likelihood  decoding 
is  a  positive  exponential  function  of  block  length.  Thus,  for  long  codes,  this  effort  becomes 
prohibitive. 

Fortunately,  for  a  class  of  block  codes  known  as  BCH  codes  [which  class  includes  the  Reed- 
Solomon  (RS)  codes],  the  computational  effort  involved  in  decoding  can  be  reduced  to  a  practical 
level  through  the  use  of  minimum  distance  decoding  techniques.  The  use  of  these  techniques 
will,  however,  involve  some  sacrifice  in  performance  relative  to  maximum-likelihood  decoding. 

In  this  chapter,  we  shall  examine  a  class  of  procedures  for  BCH  coding  on  a  channel  with 
parallel  structure.  For  the  case  of  an  MSCC  channel,  we  shall  develop  a  set  of  formulas  which, 
in  combination,  will  enable  us  to  calculate  or  bound  the  probability  of  error  associated  with  each 
procedure.  Although  general  results  concerning  performance  will  not  be  given,  some  examples 
are  computed  out  at  the  end  of  the  chapter. 

B.  CODING  ALTERNATIVES 

The  presence  of  a  number  of  parallel  subchannels  presents  us  with  a  number  of  coding  alter¬ 
natives.  One  of  the  decisions  which  must  be  made  in  choosing  among  them  is  to  decide  on  the 
number  m^  of  subchannels  to  be  coded  on  at  once.  Such  a  decision  implies  that  the  M  subchan¬ 
nels  will  be  divided  into  M/m  sets  of  m  subchannels  each.  For  each  such  set,  a  code  letter 
will  be  defined  as  the  m-tuple  consisting  of  the  m  subchannel  inputs  in  the  set  at  some  one  in¬ 
stant  of  time,  i.e.,  a  code  letter  is  a  member  of  X^.  If  we  choose  the  code  alphabet  to  be  X^, 
we  shall  classify  our  coding  technique  as  simple.  However,  we  may  wish  to  increase  the  reli¬ 
ability  of  individual  code  letters  by  choosing  the  code  alphabet  to  be  a  proper  subset  of  X^,  in 
which  case  we  classify  our  coding  technique  as  compound. 

The  code  letters  corresponding  to  each  set  of  m  subchannels  are  then  encoded  to  form  code 
words  of  length  N  (this  is  done,  separately,  for  each  set).  Each  set  of  m  subchannel  outputs 
is  then  separately  decoded  (although  there  may  be  state  knowledge  used  in  common  by  all  sets). 
An  error  is  considered  to  have  occurred  if  a  decoding  error  is  made  in  any  of  the  sets. 

The  distinction,  defined  above,  between  simple  and  compound  coding  may  be  made  more 
graphic  by  referring  to  the  following  diagram  which  shows  a  code  word  of  length  N  on  M 
subchannels : 


t  We  assume  m  divides  M. 
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TIME  DIRECTION  (along) 


In  compound  coding,  we  code  in  the  parallel  direction  (with  dimensionless  rate  less  than 
unity)  before  coding  in  the  time  direction.  In  simple  coding,  we  code  in  the  time  direction  only. 


Simple  Coding  Example 

Let  IVl  =  10,  m  =  2,  N  =  3,  and  X  =  (O,  l}.  Then, 


0 
0 

0  10 
1  1  0 


are  the  four  possible  code  letters. 


is  one  of  the  64  possible  code  words. 


Compound  eroding  Example 

Let  M  -  10,  m  =  2,  N  ^  3,  and  X  ^  {O,  l}.  Let 


0 

0 

0 


be  the  only  two  permitted  code  letters. 


is  one  of  the  eight  possible  code  words. 


C.  DIMENSIONLESS  RATE 

Obviously,  we  shall  be  interested  in  comparing  the  performance  of  various  coding  techniques 
on  particular  MSCC  channels.  To  make  the  comparisons  meaningful,  we  must  define  our  input 
and  output  parameters  with  some  care.  We  have  already  done  this  for  the  probability  of  error, 
i.e.,  an  ’’error"  means  the  same  thing  regardless  of  the  value  of  m.  Suppose  is  a  set  con¬ 
sisting  of  L  members,  and  that  on  each  set  of  m  subchannels  we  define  code  words  of 
length  N,  with 

1  <;  w  ^  l"^^  .  (5-1) 

^  m^  '  ' 

Then,  for  some  real  number  r  satisfying 

O^r^l  (5-2) 
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we  have 


W 


m 


^rmN 


(5-3) 


We  shall  call  r  the  dimensionless  rate.  If  we  consider  each  M/m-tuple  of  the  above  code  words 
to  be  a  code  word  on  the  whole  channel,  then 


VV  _  ^rlVIN 

M  '  m 


(5-4) 


Since  the  RIIS  of  Eq.  (5-4)  is  independent  of  m,  we  may  define 


r 


logi  ^ 

m 

mN 


(5-5) 


as  the  appropriate  input  parameter.  The  dimensionless  rate  is  related  to  the  rate  per  .subchan- 
nel  Rg  by 

R  -  rlnL  (5-6) 

s 

where  R  is  in  natural  units, 
s 


D.  BCH  CODES  AND  SIMPLE  CODING  SCHEMES 

The  propertie.s  of  BCH  codes  and  various  decoding  schemes  for  them  are  developed  and  de- 

1  _3t 

seribed  in  a  fairly  extensive  literature.  In  a  BCH  code,  the  code  letters  are  equal  to  (iso¬ 

morphic  with)  the  elements  of  a  finite  (Galois)  field  wTth  q  elements  CiF(q).  Sueh  fields  exist 
whenever  q  -  p^,  where  p  is  a  prime  and  n  is  a  positive  integer.  Hence,  if  a  BCH  code  is  to 
be  used  for  simple  coding  over  m  subchannels  of  a  channel  with  parallel  structure  and  subchan¬ 
nel  input  alphabet  {l,  .  .  .  ,  L},  we  require  for  some  prime  p  and  positive  integer  n. 

The  requirement  ean  be  met  if  and  only  if  for  some  prime  p  and  positive  integer  k. 


Then,  n  -  km.  Thus,  we  must  restrict  our  discussion  to  situations  where  the  subchannel  input 
alphabet  size  is  an  integer  power  of  a  prime. 

If  there  is  a  Galois  field  GF(q)  wTth  q  elements,  then  for  any  positive  integer  f  there  exists 
£  £ 

an  extension  field  GF(q  )  with  q  elements.  Let 
N  -  q  —  1 

A  sequence  (u^  •  •  ♦  >  code  letters^  niay  be  represented  by  a  polynomial  u(t)  of  degree  at 

most  N  —  1 

u(t)  =  +.  .  .  +  u.  c  GF(q) 

£ 

Pick  y  i  GF(q  )  so  that  y  is  primitive,  and  pick  d  a  positive  integer  less  than  N.  Let  the  code 

words  of  a  code  of  block  length  N  be  given  by  the  set  of  polynomials  of  degree  N  —  1  or  iess  with 

2  d-1 

coefficients  in  GF(q)  which  have  y ,  y  ,  *  •  •  ,  y  as  roots.  A  code  generate(i  in  this  way  is  defined 
as  a  BCH  code.  If  i  =  1,  the  code  is  an  RS  code.  If  r  is  the  dimensionless  rate  of  the  code 

t  Numbered  references  appear  at  the  end  of  each  chapter. 
t  Note  that  the  usual  subscript  order  for  the  letters  is  reversed  here. 
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(5-8) 


(cl  -  1 )  > 


(1  -  r)  i\ 
- 1 - 


For  HS  cocies, 


(d  1)  (1  -  r)  N  .  (5-9) 

The  actual  value  of  the  ratio  of  (1  —  r)  X  to  (d  —  1)  for  13CII  erodes  with  i  i  may  be  obtained  by 

making  use  of  the  fact  that  (1  —  r)  N  is  ecqual  to  the  degree  of  the  polynomial  which  is  the  least- 

d  —  1 

common  multiple  of  the  minimal  polynomials  for  the  (d  —  1)  field  elements  y,  .  .  .  .  This 

can  be  a  tedious  calculation.  Note,  however,  that  for  q  -  2,  we  have 


(d-  1) 


2(1  -  r)  N 


(5-10) 


and  the  parameters  of  a  number  of  such  binary  JIC’H  codes  are  tabulated  by  Peterson.^ 

We  note  that  the  parameter  d  of  a  PC'H  c'ode  is  not  necessarily  the  same  as  its  minimum 
distance,  although  it  serves  as  a  lower  hound  to  the  minimum  distance. 


E.  STATE  INFORMATION  AND  RELIABILITY 

We  shall  restrict  c:)ur  consideration  to  those  channels  for  which,  foi’  each  cy  (  A , 


^  P^(q/^)  "  fCa)  (5-11) 

q:q-/f 

independent  of  L  This  will  have  the  effect  of  making  the  pi-obability  of  correct  decoding  by  a 
minimum  distance  algorithm  independent  of  the  code  word  sent,  and  thus  greatly  simplify  the 
calculation  or  bounding  of  code  performance.  The  symmetry  requii'cmcmt,  Eq.  (5-11),  is  usu¬ 
ally  met  in  pmictice. 

The  probability  of  correct  decoding  wTll,  in  general,  I)e  affected  by  tlie  clioicM?  of  m  in  the 
simple  coding  and  decoding  schemes  described  above.  It  will  also  be  affected  by  what  state  in¬ 
formation  is  available  at  the  rec'eiver  and  how'  it  is  used.  As  wais  pointed  out  in  C^hapter  2,  w’here 
the  pliysical  channels  we  are  modeling  are  fading  channels,  it  is  usually  possible  to  obtain  par¬ 
tial  state  information  at  th('  receiver  by  making  an  energy  measurement.  (We  may  also  obtain 
this  information  by  using  some  of  the  channels  as  test  channels.)  'bhis  information  will  often 
enable  us  to  assign  a  number  representing  reliability  to  each  received  letter.  Suppose  m  sub¬ 
channels  are  coded  at  once,  and  the  reliability  b^^  of  an  m- tuple  received  at  a  particular  instant 
of  time  is  defined  as  the  probability  of  its  being  correctly  received  conditioned  on  whatever  state 
information  the  receiver  possesses.  If  the  receiver  has  complete  state  knowledge,  w^e  have 
from  Eq.  (5-11) 


)  fl  -  f(CV  )]'^ 


(5-12) 


Suppose  the  receiver  has  partial  channel  state  information  represented  by  knowledge  of  a  random 
variable  /3,  for  w’hich 


and 


p(q 'fa/3)  -  p(q/fo()  ^  p^  (q,  f) 
p(frvp)  p(n  p(a  /3) 


(5-13) 

(5-14) 


t  See  p.  166  of  Ref.  1 . 
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One  can  readily  show  that  Eqs.  (5-11),  (5-13),  and  (5-14)  imply 


H  p{q/m  =  g(/3) 

q:q^i 

independently  of  i,  with 


g(/3)  =  f(Q! )  p{a/l3) 

a  eA 


We  then  have 


(5-15) 


(5-16) 


bT,(/3)  =  [1  -  g(j3)]'^  • 

F.  MINIMUM  DISTANCE  DECODING 

Lot  Xg  Yg,  and  A  =  be  the  set  of  possible  received  sequences.  Let  g  be  a 

function  defined  on  A  X  A  with 


g(  X,  y  )  =  g(  y,  X  )  :>.  0  all  x,  y  e  A 

g(  X,  X  )  -  0  all  X  c  A 

and 


g(  X,  z  )  ^  g(  X,  y  )  +  g(  y,  z  )  all  x,  y,  z  e  A 

Then,  g  is  a  distance  function.  A  minimum  distance  decoding  scheme  is  one  which  decodes  a 
received  word  y  into  the  code  word  x  for  which  g(  x,  y  )  is  minimum. 

The  simplest  choice  for  g(  x,  y )  is  the  Hamming  distance,  which  is  simply  the  number  of 
code  letters  in  which  x  and  y  differ.  The  Hamming  distance  treats  each  code  letter  equally  and 
makes  no  use  of  reliability  information.  Decoding  with  a  Hamming  distance  is  also  referred  to 
as  crrors-only  decoding.  Efficient  algorithms  exist  for  errors-only  decoding  of  BCH  codes. 
These  algorithms  succeed  whenever  twice  the  number  of  code  letters  received  in  error  is  less 
than  d,  where  d  is  the  code  parameter. 

If  the  receiver  has  partial  state  information,  it  is  no  longer  logical  to  use  a  distance  func¬ 
tion  which  treats  all  received  letters  equally.  One  may,  for  example,  establish  a  reliability 
threshold  r^,  0<r^<l,  and  erase  the  i^^  received  letter  if  its  reliability  b^(/3)  satisfies 

b  ^  (/?)<,  r,  .  (5-18) 

One  may  then  define  g(  x,  y  )  as  the  number  of  non-erased  positions  in  w'hich  x  and  y  differ.  This 
distance  is  called  the  Elias  distance.  Decoding  with  this  distance  is  called  erasures  and  errors 
decoding.  Efficient  algorithms  exist  which  succeed  whenever  the  number  of  errors  e  and  num¬ 
ber  of  erasures  k  satisfy 

2e  +  k  <  d  .  (5-19) 

Finally,  define  v(x.,  y^^),  x^  c  X^,  and  y.  e  by 
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(5-20) 


v-(x.,  y.) 


0  if  X.  -  y. 
1 

1  if  X.  ^  y. 

1  ^  j  1 


Then,  if  is  the  reliability  of  the  i^^  received  code  letter,  and  h(t),  0  h(t)  ^'1,  is  a  monotone 
nondecreasing  function  of  t,  0  <;  t  1,  we  may  define  g(x,  y)  by 


N 

g(x,  y)  2]  h(bj^)  v(\.,  y.)  .  (5-21) 

i  1 

Decoding  with  this  distance  function  is  called  generalized  minimum  distance  decoding.^  Efficient 
algorithms  exist  wliich  are  successful  when  the  transmitted  word  x  and  received  word  y  obey 


N 

2g(x^y)-  d-X+  ^  h(b^) 
i  1 


Since  explicit  analytical  error  bounds  do  not  exist  for  generalized  minimum  distance  decod¬ 
ing,  we  shall  confine  our  analyses  to  decoding  wdth  erasures,  errors,  or  both.  It  should  be 
pointed  out,  however,  that  when  b^(/i)  may  take  on  many  widely  separated  values  (e.g.,  1.0,  0.9, 
0.8,  0.7,  0.6,  0.5,  0.4,  0.3,  0.2,  0.1,  0.0),  generalized  minimum  distance  decoding  should  promise 
sufficient  advantage  over  erasures  and  errors  decoding  to  justify  the  numerical  calculation  of  a 
bound  in  a  practical  situation. 

Note  that  the  Hamming  distance  is  obtained  from  Eq.  (5-21)  by  setting 
h(t)  1  all  t,  0^'  1  <  1 

The  Elias  distance  is  obtained  by  setting 


h(t) 


f  0  if  t ...  r. 


1  if  t  ■  r^ 


G.  SINGLE-LETTER  ERASURE  AND  ERROR  PROBABILITIES 

VVe  are  now^  in  a  position  to  compute  the  single-letter  error,  erasure,  and  correct  reception 

probabilities  (p  ,  p  ,  and  p  ,  respectively)  for  an  MSCC  channel  when  the  receiver  has  knowledge 
e  s  c 

of  (3,  and  m  subchannels  are  coded  at  once.  Let  F  be  the  set  of  all  possible  p.  For  conven¬ 
ience,  we  shall  assume  that  (3  is  a.  discrete  variable.  Define  F  C  F  as  the  set  of  (3,  for  w'hich 

.  (5-22) 

Then,  let 

Pg  =  i;  PW)  (5-23) 

^  s 


Pe  -  ^  P(/?)  {1  -  [1  -  g(/3)]'^) 

/3er 

^  s 


t  See  Ref.  2,  pp.  12-24. 


(5-24) 
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(5-25) 


where  is  the  complement  of  F^. 

For  the  special  case  of  complete  channel  state  knowledge,  we  may  define  as  the  set  of  a 
for  which 

b  (G:)^r,  (5-26) 

m  ^  t 

and  Ag  as  the  complement  of  A^.  Then,  Eqs.  (5-23),  (5-24),  and  (5-25)  hold  if  a  replaces  j3, 
f(Qf)  replaces  g(/3),  and  A  replaces  F. 

For  the  special  case  of  no  channel  state  knowledge, 

Pg  -  0  (5-27) 

Pe  =  Z  p(a)  {l  -  [1  -  f(a)]'^} 
a  €A 

=  l-p^  .  (5-28) 


H.  PROBABILITY  OF  CORRECT  DECODING 

Using  the  probabilities  derived  in  Sec.  G  above  and  Eq.  (5-19),  it  is  possible  to  calculate  the 
probability  P^(m)  of  correct  decoding  of  a  single  set  of  m  subchannels: 


V  _ N] _  a  k,,  ,N-i-k  ..  onx 

V.  k:  (N-f-k)'.  Pe^s  '  (5-29) 

i,k: 

2i+k<d 


Equation  (5-29)  assumes  erasures  and  errors  decoding,  but  reduces  to  errors  only  or  erasures 
only  if  p^  -  0  or  p^  -  0,  respectively. 

1.  CHERNOFF  BOUND 


It  is  often  difficult  to  evaluate  Eq.  (5-29).  To  evaluate  conveniently  the  performance  of  BCH 

4 

codes,  we  shall  need  to  use  the  Chernoff  bound.  Let  u^,  1  i  ^  N  be  a  set  of  independent  iden¬ 

tically  distributed  random  variables  with  mean  u;  let  c  >  0  and  X  =  u  +  6;  let  a  ^(t)  be  defined  by 

f 1  for  t  ^  0 
a  .(t)  =  ] 

[  0  for  t  <  0  ,  (5-30) 


Then, 


u.  — 
1 


(5-31) 


73 


For  any  s  ^  0, 


a_^(t)  ^  exp  [st] 


(5-32) 


Thus, 


■  /  N  Y 

r  /  N  \  1 

E 

?.l(  Z  u.-Nxj 

^  E 

exp 

s  ^  u.  -  NX  1 

.  \i=l  /. 

-  \i=l  /. 

(5-33) 


and 


■  /  N  Y 

N 

E 

exp 

s  u.-NX 

E  ' 

[]  exp[s(u. -X)] 

.  \i=l  /. 

,  i-  1 

(5-34) 


=  (e  {exp  [s(u  -  A)])^'^  (5-35) 

where  we  use  the  fact  that  the  are  independent  and  identically  distributed.  Thus,  if 

n(s,X)  -  -In  (E(exp[s  -  X])}  (5-36) 

we  have  for  s  ^  0, 

p^|j  X  u.  ^  exp[-NI)(s,X)]  .  (5-37) 

The  bound  is  tightest  for  s  satisfying 


ds 


F  {exp  [s(u  -  A)]}  =  E  (  (u  -  A)  exp  [s(u  -  A)]}  0 


or 


E(u  exp  [su]) 
E(exp  [su]) 


(5-38) 


Since  A  >  u,  a  unique  positive  solution  s^  to  Eq.  (5-38)  is  guaranteed  to  exist^  if  the  variance  of 
u  is  positive. 


J.  CHERNOFF  BOUNDS  FOR  ERASURES  AND/OR  ERRORS  DECODING 
Theorem  5. 1.  (Chernoff  Bound  for  Errors  and  Erasures) 

Suppose  we  use  a  BCII  code  of  block  length  N  and  parameter  d  over  a  channel  with  erasure 
probability  p^,  error  probability  p^,  and  probability  of  correct  reception  p^.  Let 

Pc  +  Pg  +  Pe  ^  (5-39) 

p^>0  (5-40) 

and 


2p  + 

e 


Po  < 


(5-41) 


t  This  is  proved  in  essentially  the  some  way  as  the  similar  result  in  part  (b)  af  Theorem  4.8  in  Chapter  4. 
E(exp[su])  is  the  moment  generating  function  associated  with  the  random  variable  u. 
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Then,  the  best  Chernoff  bound  to  the  probability  that  decoding  will  fail,  P^,  is  given  as  follows: 


Pg  ^  exp[-ND(d)] 


where 


and 


D(d)  =  s 


2s 


Pc  Ps®  Pe® 


1 


s  ^  In 
o 


-Pg(t-l)  / 

■pg(t-  1)' 

2  P^t 

2Pe(t-2)  +7 

2Pg(t-  2) 

Pe(2-t) 

(5-42) 


(5-43) 


(5-44) 


Proof. 

By  Eq.  (5-19),  the  probability  of  error  in  decoding  is  at  most  the  probability  that  2e  +  k  ^  d. 


Now,  define  random  variables  u.,  1  ^  i  N  as  follows: 
"  ^  ^ 


u.  =  0 

L 

with  probability  p^ 

U.  =  1 

1 

with  probability  p^ 

u.  =  2 

1 

with  probability  p^ 

These  N  random  variables  will  be  assumed  to  be  independently  distributed.  Clearly, 

/  N  \ 


P(2e  +  k^d)  =  P  I  ^  u. 


(5-45) 


i- 1 


The  HHS  of  Eq.  (5-45)  may  be  Chernoff  bounded  as  per  Eqs.  (5-36),  (5-37),  and  (5-38),  where 
X  t.  Now, 


E 


2s 


.  0  ,  0 

p  +  p  e  -1-  p  e 

^c  ^  s  ^e 

(5-46) 

s  2s 

0,0  0 

P.s®  ^Pe® 

(5-47) 

Thus,  by  Eq.  (5-38), 


t  = 


s  2s 

0,0  o 

PgS  +  2p^e 

s  2s 

,  o  ,  »  o 

Pc  +  Ps®  +  2Pe® 


(5-48) 


s 

Equation  (5-48)  is  a  quadratic  in  e  .  vVhen  it  is  solved  and  the  natural  logarithm  is  taken,  the 
RHS  of  Eq.  (5-44)  results.  From  Eq.  (5-36), 
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D  =  -  In  E 


s^(u-t) 


[O 

O 


/  S  u  V 

('° ) 

=  st  —  ln(p  +pe  +pe  ) 
o  \^c  ^  s  / 


-  +s  t  —  In  E 
o 


Theorem  5.  2.  (Chernoff  Bound  for  Erasures  Only) 

Suppose  we  use  a  BCH  code  of  block  length  N  and  parameter  d  over  a  channel  with  erasure 

probability  p  and  probability  of  correct  reception  (1  —  p  ).  Suppose 
s  s 

0  <  Pg  <  ^  =  t  <  1  .  (5-49) 

Then,  the  probability  that  decoding  will  fail,  P^,  is  bounded  as  follows: 

exp  {-xN  [-t  Inpg-  (1  -  t)  ln(l  -  p^)-  H(t)]}  (5-50) 

where 

K(t)  =  -t  Int  -  (1  -  t)  In  (1  -  t)  .  (5-51) 

Proof. 

s 

The  proof  proceeds  as  that  of  Theorem  5.1  up  to  Eq.  (5-48),  w^hich  is  now  linear  in  e 
Substituting  the  value  of  s^  obtained  in  Eq.  (5-36)  gives  our  result. 

Theorem  5.  3.  (Chernoff  Bound  for  Errors  Only) 

Suppose  we  use  a  BCH  code  of  block  length  N  and  parameter  d  over  a  channel  with  error 
probability  p^  and  probability  of  correct  reception  (1  —  p^).  Suppose 

0  <  2p^  <  I  =  t  <  1  .  (5-52) 

Then,  the  probability  that  decoding  will  fail,  is  bounded  as  follows: 

exp  {-N  [-  I  Inpg-  (1  -  |)  ln(l  -  p^)  -  H(t/2)]}  (5-53) 

where  K(t)  is  given  by  Eq.  (5-51). 

Proof. 

This  is  really  a  corollary  of  Theorem  5.1.  We  set  Pg  ^  0.  ^c  ^  ^  ~  ^e 
substitute  the  result  in  Eqs.  (5-43)  and  (5-42)  to  obtain  our  result. 


K.  BOUNDS  ON  TOTAL  PROBABILITY  OF  DECODING  FAILURE 

We  have  one  further  topic  we  must  explore  before  proceeding  with  the  analysis  of  simple 
coding  schemes  on  some  specific  MSCC  channels.  Earlier,  we  agreed  to  count  the  decoding  as 
being  in  error  if  a  decoding  failure  occurs  in  any  one  of  the  M/m  =  S  sets  of  m  subchannels 
each,  where  m  subchannels  are  encoded  at  once.  If  the  probability  of  error  P^(m)  is  computed 
or  bounded  above  for  each  such  set,  separately,  then  the  total  probability  of  error  P^  may  be 
estimated  by  the  use  of  the  union  bound.  Thus,  if  for  each  set  of  m  subchannels. 
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(5-54) 


P  (m)  ^ 

we  have 

SPg(m)<:  .  (5-55) 

Since,  at  each  instant  of  time,  all  the  subchannels  of  an  MSCC  channel  are  required  to  be 

in  the  same  state,  we  would  normally  expect  that  errors  in  the  various  subchannel  sets  would 

tend  to  occur  together,  and  thus  that  a  better  estimate  than  the  union  bound  exists.  Suppose  a 

N 

is  a  sequence  of  N  subchannel  states,  and  A  is  the  set  of  such  sequences.  Then,  we  have 

S  p(a)  [1  -  Pg(m/?)]^  (5-56) 

,  N 
a  eA 

where  P^(m/Q;)  is  the  probability  of  decoding  failure  on  a  single  set  of  m  subchannels  when  the 
sequence  of  subchannel  states  was  a.  Equation  (5-56)  is  obtained  from  the  elementary  rules  of 
probability  after  noting  that  when  the  state  vector  a  is  given,  the  subchannels  become  independ¬ 
ent  of  each  other.  The  principal  limitation  on  the  use  of  Eq.  (5-56)  is  the  fact  that  even  though 
P^(m/o')  depends  only  on  the  number  of  each  subchannel  state  in  the  sequence  cv,  it  is  not  gen¬ 
erally  easy  to  compute  or  bound.  Equation  (5-56)  is  most  easily  used  when  P^(m/cy)  is  equal 
either  to  zero  or  to  unity  for  all  a.  Then, 

P  -  P  (m)  all  m  .  (5-57) 

e  e'  '  ' 

This  situation  occurs  in  crasures-only  decoding.  Equation  (5-57)  reflects  the  fact  that  decoding 
will  fail  if  there  are  d  or  more  erasures,  and  that  the  same  number  of  erasures  occur  for  each 
set  of  m  subchannels  because  our  channel  is  MSCC. 

L.  ERROR  EXPONENTS  FOR  SOME  EXAMPLES  OF  MSCC  CHANNELS 

We  shall  now  compute  error  bounds  for  simple  BCH  coding  on  some  particular  MSCC  chan¬ 
nels.  The  subchannel  input  and  output  alphabets  will  be  binary,  and  the  number  of  subchannels  M 
will  be  seven.  Since  seven  is  prime,  we  shall  have  only  two  simple  coding  alternatives  to  con¬ 
sider:  first,  to  code  and  decode  on  all  seven  subchannels  simultaneously,  using  an  RS  code  for 
which  the  alphabet  size  is  2  =  128,  and  block  length  is  2  —  1  =  127;  second,  to  code  and  decode 

each  subchannel  separately,  using  a  binary  BCH  code  of  the  same  block  length.  The  relationship 
among  d,  r,  and  N  for  the  RS  code  is  given  by  Eq.  (5-9);  for  the  binary  BCH  code,  the  relation¬ 
ship  is  obtained  from  Table  9.1  of  Ref.  1.  The  results  we  need  are  summarized  in  Table  I. 

We  define  a  new  exponent,  B(r),  by 

B(r)  =  D[d(r)]  -  (5-58) 

and  an  upper  bound  to  the  total  probability  of  decoding  failure  P^  by 

P^  -  exp[-127B(r)]  .  (5-59) 

Hence,  we  have 

exp[-127B(r)]  =  P^  .  (5-60) 


77 


RATE 

FOR 

TABLE  1 

AND  MINIMUM  DISTANCE 

SOME  BINARY  BCH  CODES 
(N  =  127) 

r  X  N 

d 

rX  N 

d 

120 

3 

57 

23 

113 

5 

50 

27 

106 

7 

43 

29 

99 

9 

36 

31 

92 

11 

29 

43 

85 

13 

22 

47 

78 

15 

15 

55 

71 

19 

8 

63 

64 

21 

Fig.  24.  Exponent  vs  dimensionless  rate  for  Reed-Solomon  code  -  Example  1. 
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Example  1 


M  -  7  State  Known  at  Receiver 

a  1 

p(a)  •  1/2 


2 

1/2 


Pa(y|/«i) 

1 

1  -  1 

We  note  that  the  erasures-only  bounds  are  applicable  here,  and  p^  l/Z.  For  the  RS  code, 
r^  1/2  implies  that  Eq.  (5-49)  holds  and  we  may  compute  a  positive  error  exponent  using 
Eqs.  (5-50)  and  (5-51).  The  result  is  plotted  in  I^Tg.  24. 

Now,  we  consider  a  binary  H(T1  code  on  one  subchannel  of  this  channel,  and  note  that  the 
expected  number  of  erasures  is  63.5.  Even  the  lowest  rate  binary  ECTl  code  given  in  Table  I 
corrects  at  most  62  erasures.  Hence,  the  probability  of  decoding  failure  exceeds  0.5  for  all 
positive  rates.  Thus,  coding  over  all  subchannels  at  once  is  clearly  a  superior  procedure  her  e 

Example  2 

M  7  State  Unknown  at  Receiver 


a  1  2 

p{a)  1/2  1/2 


c  =  0  05,  0.10,  0.15,  0.  20 

For  the  RS  codes, 

Pe  =  I  [1  -  (1  -  .  (5-61) 

For  each  value  of  c,  Eqs.  (5-52),  (5-6l),  and  (5-9)  tell  us  at  which  rates  a  positive  exponent 
may  be  expected.  If  Eq.  (5-52)  is  satisfied,  the  exponent  is  the  expression  in  square  brackets  in 
Eq.  (5-53).  These  exponents  are  plotted  in  Fig.  25  for  c  -  0.05,  0.10,  0.15,  and  0.20  (curves 
labeled  S). 

For  the  BCTI  codes, 

Pp  -  c/2  .  (5-62) 

For  each  value  of  c,  Eqs.  (5-52),  (5-62),  and  Table  I  tell  us  at  which  rates  a  positive  exponent 
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Fig,  25.  Exponent  vs  dimensionless  rote  for  Reed-Solomon  (S)  and  binary  BCH  (B)  codes 
on  several  MSCC  channels  —  Example  2. 

may  be  expeetcnl  in  th('  bound  for  If  Kq.  (S-52)  is  satisTH'd,  Uio  ('xpoiumt  I)(i-)  is  tlu'  c'x- 

pi-(\ssion  in  square*  brackets  in  Jk],  (S-S3).  We  plot  the*  exporK'nt  n(r)  ^nven  by  Kq.  (S-58)  whcni- 
evc*r  it  is  positive';  plots  at'e  shown  in  1-  i^.  for  c*  =  O.OS^  G.IO,  O.IS^  and  0.20  ((airves  labole'd  H). 

W(*  see  fi'om  the  curves  that,  for  c  0.10,  0.1s,  and  0.20,  the*  binary  HC'Il  (*\ponent  is 
^^re'ator  than  tin*  ItS  (*xpe)n('nt  at  all  rates.  Ilenc'e,  a  binary  HC'Il  code  would  be*  e)ur  c-he)ic(*  fe>r 
the'se*  e'xample*s.  in  addition,  the  l)inarv  HC'll  ce)d(*  is  easier  to  impU*me'nt.  If  c  -  O.OS,  it  app(*ars 
as  though  the  favoi'e*d  t'ode  d(*pends  e)n  the*  rate*.  (Note  that  the  curves  for  the  l)inary  HC'll  e'e)ele*s 
se*r'va'  e)nly  tej  e'onnect  tlie*  elata  i^oints  and  haxe*  ne)  meaning  ])et\ve*en  the'm.  Thus,  tlu*  ITS  anel  bi¬ 
nary  H(  II  e‘e)eles  may  e)nly  be  e!ompare*el  at  the*  data  points  fe)r  the*  latter.) 

\e)w,  we*  can  examine*  the*  r(*ase)ns  w  Ity  we*  have*  edUained  the  abe)ve  re'sults.  Whe*n\ve  incre*ase* 
the*  number  of  subediannels  coeU*ei  e)ver  at  e)ne'e*,  two  things  happen.  First,  the*  alphabet  size  in- 
e“r*e*asf's,  w  ith  a  resultant  ine*re'ase'  in  the  ce)de  parame*ter  el  fe)r  a  fixeel  elimens ionle*ss  rate  and 
bloe'k  le'ngtii  N.  'flu*  largest  value  e)f  el  re)r  a  fixeel  elimens  ionics  s  rate  and  ble)ek  length  is  that 
giv(*n  by  Fc].  (S-Q)  anel  is  achieveel  for*  an  alpliabet  size  e)ne  greater  than  the  bloe'k  length.  Sece)nel, 
the*  pre)bability  e)f  a  ce)ele  letter'  being  receiveel  in  error  (or  oraseel  if  the*  re'ceiver  lias  state*  knowl- 
e*elge')  me*re'ases  |s(*e*  Fqs.  (5-17),  (5-22),  (5-23),  anel  (5-24)],  with  a  re'sultant  inci'ease  in  the  e'x- 
pee'te'el  numbe*!' of  er  I'ors  anel  era.sui'e*s.  Fxample*  1  is  a  spt'e'ial  case  in  that  the  e*i'a.sure  preibabil- 
ity  I'cmains  the*  same*  i'(*gardless  eif  how  many  subchanne*l.s  are  e*oele*el  e>ve*i'.  llc-nce*,  the  ine'i'e*a.s'e 
in  alphabet  size*  is  (*ntirely  b(*nef ie'ial,  anel  the  HS  oeiele  is  superior.  In  Ifxample  2,  the  increase 
in  (*i'roi'  pr-e)bal)il  ity  with  the  numbe*i'  of  subchannels  cejeleel  eiver  is  the  elominant  effect  feir  c  -  0.10, 
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0.15,  and  0.20.  L'nlike  the  situation  in  Example  1,  we  can  only  determine  this  after  making  a 
calculation. 

In  general,  we  would  expect  the  optimal  number  of  subchannels  coded  over  at  once  to  lie 
somewhere  between  1  and  M.  The  determination  of  this  optimum  number  (which  may  depend 
upon  the  rate),  in  more  general  situations  than  we  have  considered,  involves  considerable  labor. 
This  labor  is  primarily  due  to  the  difficulty  of  finding  the  precise  relationship  among  code  pa¬ 
rameter,  dimensionless  rate,  and  block  length  for  non-binary,  non-RS  BCH  codes.  Inequality 
(5-8)  is  of  some  help  here.  If  assuming  (5-8)  were  satisfied  with  an  equal  sign,  we  compute 
our  exponent  and  find  it  to  be  greater  (for  the  same  block  length)  than  that  for  a  RS  code,  then 
we  are  secure  in  concluding  that  the  RS  code  is  not  best. 

M.  COMPOUND  CODING 

The  problem  mentioned  above,  of  an  increase  in  code  letter  probability  of  error  with  in¬ 
creasing  number  of  subchannels,  has  an  obvious  solution  —  to  code  across  the  subchannels  be¬ 
fore  coding  along  them.  This  is  what  we  have  called  compound  coding.^  The  number  of  channel 
code  letters  A  (i.e.,  the  number  of  possible  input  M-tuples,  at  a  given  instant  of  time)is  given  by 

A  = 

for  some  0  r^  1.  The  number  of  channel  code  words  W  is  given  by 

r.,N  r,  r.,M 
U  A  ^  =  L  ^  2 

for  some  0^  ^2^  Thus,  compound  coding  is  comparable  to  r  for  simple  coding. 

C’ompound  coding  does  not  seem  to  be  an  attractive  technique  for  MSCC  channels.  In  the 
first  place,  we  must  make  r^  In  L  smaller  than  the  capacity  (in  natural  units)  of  the  worst  sub¬ 
channel  state  whose  reliability  we  wish  to  improve.  Furthermore,  for  a  number  of  subchannels 
of  order  100,  the  improvement  in  reliability  is  generally  not  too  marked  unless  r^  In  L  is  one- 
half  or  less  the  capacity  of  the  worst  subchannel  state.  Thus,  compound  coding  is  generally  ap¬ 
plicable  only  to  low  rates.  That  the  exponents  obtained  even  at  these  low  rates  are  not  generally 
as  large  as  those  obtained  for  simple  coding  is  not  so  obvious.  Indeed,  we  cannot  be  sure  that 
compound  coding  is  not  advantageous  in  some  instances,  although  intuition  suggests  that  coding 
in  a  direction  in  which  we  have  no  ’’diversity"  will  not  be  advantageous. 
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t  This  resembles  the  approach  taken  in  concatenated  coding  (see  Ref.  2). 
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CHAPTER  6 

MORE  GENERAL  CHANNEL  MODELS 


In  Chapters  4  and  5,  we  restricted  our  discussion  to  channels  which  are  MSCC.  Roughly 
speaking,  such  channels  have  three  main  properties:  state  structure,  memorylessness,  and 
identical  subchannel  states  at  each  instant  of  time.  We  shall  here  consider  channels  in  which 
the  second  and  third  properties  may  not  obtain. 

A.  MARKOV  PARALLEL  CHANNEL 

Let  p. .  be  such  that 
ij 

0  <  Pjj  ^  1  1  <  ij  <  s 

and 

S 

p.j  -  1  1  <  S  .  (6-1) 

,1=1 

Let  {u.}j^^  be  a  solution  of 
S 

Li.  -^,u.p.  l^i<S  (6-2) 

J  r  ij 

i=  1 

{u.}  is  called  a  stationary  distribution.  Let 

up.-u.p..  1^i,i<S  .  (6-3) 

Definition 

A  Markov  Parallel  Channel  (MPC’)  is  an  MS  channel  with  A  -  (l,  .  .  .  ,  S}  in  which  the  prob¬ 
ability  p(e^,  ♦  .  .  ,  e^)  that  any  k  successive  subchannels  are  in  states  e^,  .  .  .  ,  e^^,  respectively, 
is  given  by 


p(e 


r  •  •  • 


e,  =  u  p 
k  e/  e  .  e„ 
1  12 


'k-l^k 


(6-4) 


irrespective^^  of  the  direction  of  progression  across  the  subchannels. 

The  channel  we  have  just  defined  has  relatively  simple  subchannel  dependencies  w'hile  not 
requiring  the  states  of  all  the  subchannels  to  l)e  the  same  during  a  single-letter  transmission. 
The  MSCC  channel  is  the  special  case  of  the  MPC  with  p.^  -  6-^.  By  our  remarks  on  time- 
parallel  duality  in  Chapter  2-13,  we  should  be  able  to  make  use  of  known  results  on  single  chan¬ 
nels  wTth  a  Markov  state  dependence  in  time^  to  analyze  the  MPC. 


t  Numbered  references  appear  at  the  end  af  each  chapter, 

tEquatian  (6-'3)  is  the  cansistency  canditian  which  allaws  this  (see  Ref.  1), 

§The  channel  with  Markav  state  dependence  in  time  is  called  a  "discrete  finite  state  channel"  in  Ref,  2. 
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B.  CAPACITY  OF  MPC 


The  first  problem  we  might  wish  to  consider  is  that  of  the  computation  of  the  capacity  of  the 
MPC.  Unfortunately,  there  is  no  simple  formula  for  capacity  in  terms  of  the  channel  parameters 
given.  C'lilbert  computes  the  capacity  of  a  single  channel  with  Markov  state  dependence  in  time 
for  the  special  case  where  both  input  and  output  are  binary  and  there  are  only  two  channel  states. 
The  first  state  guarantees  that  the  output  and  input  be  the  same.  In  the  second  state,  all  transi¬ 
tion  probabilities  are  one-half.  (The  states  correspond  to  and  P2(^/'n)  Example  1, 

Chapter  3.)  Since  Gilbert’s  result  is  a  capacity  per  use  of  the  channel  defined  by  a  limiting 
process,  Theorem  2.5  suggests  that  his  results  will  serve  as  upper  bounds  to  the  capacity  per 
subchannel  In  general,  Theorems  2.3,  2.4,  2.7,  and  3.1  maybe  combined  to  obtain  upper 

and  lower  bounds  to  which  are  relatively  easily  calculable.  These  are  given  by 


S  S 

max  u  lUx  :  Y  )  +  ^  u  logu-  +  (1  -  ^)  '''  u.p.  logp.. 

,,  j  p'  s'  s'  A1  ,1  ^  ,1  '  M'  ^  i^^ij  ^'i,i 

'  J-1  .i  f  i-f 


S  S 


C  ,,  max  u  l'^iX  ;  \  ) 
sM  ^  -  -<  1  p  s’  s 

pel  .  ,  ^ 

^  1 


(6-5) 


where,  if 


then. 


XgHl . L}  ,  Yg-(1 . Q} 


Q  I  'r\ 

1  V  /  p/q 

^'s'  ^  ^  ^  Pj(q/<)  log-L — - - 

q~  1  1  E  p.(q/k)  p(k) 

k=l  J 


(6-6) 


[Reeall  that  j  is  the  subchannel  state,  and  p(f)  is  the  subchannel  input  distribution.] 


C.  RANDOM  CODING  EXPONENT  FOR  MPC 

Unfortunately,  no  results  are  available  concerning  maximum -likelihood  RCE’s  for  Mar-kov 
channels.  Yudkint  derives  RCE’s  for  Markov  channels  with  a  type  of  nonmaximum-likelihood 
decoding.  By  duality,  his  results  carry  over  to  the  MPC  without  essential  change. 


D.  SYSTEMATIC  CODING  FOR  MPC 

In  contrast  to  the  situation  which  exists  for  the  RCE,  the  performance  of  BCH  codes  with 
simple  coding  schemes  and  minimum  distance  decoding  can  be  evaluated  almost  as  readily  for 
the  MPC  as  for  the  MSCC,  We  continue  to  assume  that  Eq.  (5-11)  holds.  What  is  changed  is  our 
computation  of  the  reliability  b^  (probability  of  correct  reception)  of  an  m-tuple  of  subchannel 
inputs.  Suppose,  for  convenience,  we  choose  the  m-tuple  to  consist  of  the  first  m  subchannel 

inputs.  The  state  vector  we  are  concerned  with  is  cv  ~  {a  ^ . ^  Thus,  for  the  case  of 

complete  state  knowledge. 


t  See  Ref.  2,  Chapter  IV. 
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m 


V"’  =  n  . 

i=l 


(6-7) 


If  we  have  no  state  knowledge 

V 


.  _  V 

m  ^ 


u  p  •  •  •  p 

O' .  . cv ^  .a 

A  A  1  1  2  m  .  . 

o'.eA  -ocA  1=1 

1  m 


n  (i-f(ffi)i 


(6-8) 


Suppose  we  have  a  random  variable  (S  -  (/3^,  .  .  .  representing  partial  knowledge,  with 

p(y/xcv/3)  =  p(y/xQ')  =  p_^(y/x)  (6-9) 


p{xaP  )  =  p(x)  p(o/8  ) 


and 


=  Pi(y/x./3.) 


for  all  i,  1  <  i  <  m.  Then,  if  we  define  by 


=  Z  Pi(yiA./3i’ 


(6-10) 


(6-1  1) 


(6-12) 


y  .:y.^x. 


g^(/3^)  is  independent  of  x^,  and 


(6-13) 


O'  .  €  A 
1 


.th 


g^(/S^)  is  the  probability  that  the  i  subchannel  symbol  will  be  incorrectly  received  given  that 
the  i^^  component  of  the  partial  knowledge  vector  is  /?..  Thus,  if  the  receiver  knows  ft, 


i=l 

The  erasure  criterion  is  of  the  form 

b  ip)  ^  r, 
m  ^  t 


(6-14) 


(6-15) 


If  we  define  as  the  set  of  p  for  which  Eq.  (6-15)  holds,  Eqs.  (5-23),  (5-24),  and  (5-25)  become 


Ps  =  Z  P<?) 


BeT 


(6-16) 


Pe  =  Z 


per 


n 

i-1 


(6-17) 
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(6-18) 


Pc  ^  “  Ps  “  Pe 


where  is  the  complement  of  If  the  receiver  has  no  channel  state  knowledge,  Eqs.  (5-Z7) 

and  (5-ZH)  become 


(6-19) 


m 


P 


e 


^  p(«)  1  n 


a  i  A 


(6-20) 


c 


E.  OTHER  MS  CHANNELS 

Needless  to  say,  the  MPC  and  MSCC'  channel  are  not  the  only  possible  MS  channels.  How¬ 
ever,  together  with  the  independent  subchannel  case,  they  are  the  only  MS  channels  for  which 
random  coding  results  of  any  generality  are  known.  If,  for  some  reason,  it  seems  desirable 
to  use  some  other  MS  channel  model,  numerical  computation  may  provide  the  only  guide  to  the 
behavior  of  the  HC’E.  The  performance  of  minimum  distance  decoding  of  BC’Il  codes  may  none¬ 
theless  be  evaluated  with  an  effort  comparable  to  that  involved  in  a  similar  evaluation  for  the 
MPC.  The  details  of  this  evaluation  involve  an  obvious  extension  of  the  material  in  the  preceding 
section. 

F.  CHANNELS  WITH  BOTH  TIME  AND  PARALLEL  DEPENDENCIES 

Thus  far,  we  have  considered  only  memoryless  channels  or,  equivalently,  channels  with 
memory  but  no  parallel  dependencies.  An  obvious  generalization  is  to  channels  with  dependencies 
in  both  the  time  and  parallel  directions.  We  shall  assume  a  state  structure  where  A  is  the  set 
of  subchannel  states,  and  a  conditional  probability  distribution  p^(^/7]),  ^  t  Y^,  j]  i  is  as¬ 
sociated  with  each  o'  c  A. 

G.  BLOCK  MODEL 

Suppose  w^e  have  a  channel  consisting  of  M^  subchannels  which,  at  each  transmission  instant, 
are  all  in  the  same  state.  Suppose,  too,  that  there  is  an  integer  M^  such  that  for  any  integer  k, 
the  state  which  is  in  effect  at  time  kM^  +  1  must  persist  until  time  (k  +  1)  M^,  and  that  the  state 
corresponding  to  each  value  of  k  is  independent  of  all  the  others.^  The  channel  is  cyclostationary 
rather  than  stationary,  because  a  change  in  state  may  occur  only  at  specified  times. 

The  significant  facts  about  the  channel  are  that  a  block  of 


(6-21) 


M  ^  M^M^ 


subchannel  letters  is  transmitted  while  the  corresponding  subchannel  states  arc  all  the  same, 
and  that  the  state  for  each  block  is  independent  of  the  states  for  the  others.  Hence,  we  may  make 
use  of  our  MSCC  results  for  the  block  model.  Let  be  the  rate  per  block  of  length  M^.  Then, 
the  rate  per  subchannel  per  channel  use  is  given  by 


t  Note  that  if  =  ] ,  we  have  the  dual  of  the  MSCC  channel.  This  serves  as  a  simple  model  for  a  single  channel 
with  memory. 
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(6-22) 


R  -  ^  ^ 

"s  M,  M 

1 

Results  concerning  in  Chapter  4  remain  true,  where  M  is  now  the  number  of  subchannel 

letters  in  the  block.  We  have  also,  for  block  codes  of  length  that 

Pg  <  exp  (-NEj^(Rg)]  (6-23) 

where  E^^(R^)  is  precisely  the  same  as  in  Chapter  4,  with  M  given  by  Eq.  (6-21). 

H.  CONSTRAINED -MARKOV  MODEL 

Suppose  we  have  a  channel  consisting  of  M  subchannels  where,  at  each  transmission  instant, 

all  the  subchannels  must  be  in  the  same  state.  Suppose,  further,  that  the  state  sequence  in  time 

2 

is  a  Markov  chain.  Then,  using  the  results  of  Yudkin,  we  might  hope  to  pursue  a  line  of  reason¬ 
ing  similar  to  that  in  Chapter  4  to  prove  theorems  such  as  those  in  Chapter  4  for  nonmaximum- 
likelihood  decoding  of  block  codes  on  this  channel.  This  seems  like  a  promising  area  for  future 
research. 

Obviously,  the  constrained-Markov  model  has  a  dual.  This  dual  has  a  Markov  state  depend¬ 
ence  in  the  parallel  direction.  In  the  time  direction,  each  subchannel  state  persists  for  a  "block 
length"  of  M^  uses  of  the  channel.  At  the  start  of  a  new  block,  the  set  of  subchannel  states  is 
chosen  independently  of  prior  states  according  to  the  Markovian  rule  given.  This  dual  seems 
less  attractive  as  a  model  for  physical  channels  than  the  constrained  Markov  model  itself. 

I.  OTHER  MODELS 

Clearly,  any  one-dimensional  discrete-time  random  process  which  is  not  independent  from 
shot  to  shot  may  be  combined  with  complete  constraint  in  the  parallel  direction  to  yield  a  state 
process  for  a  channel  with  both  time  and  parallel  dependencies.  Since  general  results  concern¬ 
ing  the  single-subchannel  versions  of  such  channels  are  not  available,  one  would  anticipate  dif¬ 
ficulty  in  analyzing  the  multiple-subchannel  case. 

When  we  consider  the  case  of  channels  which  are  neither  completely  constrained  nor  in¬ 
dependent  in  either  the  time  or  the  parallel  direction,  it  becomes  difficult  even  to  find  simple 
models  for  the  underlying  state  process.  Here,  the  prospect  for  other  than  numerical  results 
is  slim  indeed,  and  even  numerical  results  can  only  be  obtained  with  great  difficulty. 
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t  We  assume  that  the  first  letter  of  each  code  word  occurs  at  time  kM2  +  1 ,  for  some  integer 


k. 
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APPENDIX  A 

CHANNELS  WHICH  ARE  MC  BUT  NOT  MS,  AND  RELATED  TOPICS 


Theorem  A.l. 

There  exists  a  two- subchannel  MC  channel  with  X  =  Y  =  {o,l}  which  is  not  MS. 

s  s 

Proof. 

Let  the  conditional  probability  distribution  p(j^y^/x^x^)  of  a  2  x  2  x  2  MC  channel  be  given 
by  the  entries  in  the  following  matrix: 


^1^2 


00 

01 

10 

11 

00 

0.5 

0 

0 

0 

01 

0 

0.5 

0.5 

0.5 

10 

0 

0.5 

0.5 

0.5 

11 

0.5 

0 

0 

0 

It  may  be  verified  [using  Eq.  (2-2)]  that  this  channel  is  MC.  Suppose  this  channel  is  MS;  then, 
for  some  set  of  subchannel  conditional  probability  distributions  (Pq, /’I ^  ^  » ’I  ^  {0»  i}»  cind 

some  joint  distribution  p(Qf^,  have 


For  each  value  of  ct,  we  may  depict  p^(^/?7)  as  follows: 


(A-1) 


0  ^  a(a);$  1 


0  ^  b(a)  ^  1 


Let  us  consider  the  four  "pure”  channels  which  are  diagrammed  below: 
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(A-2) 


One  can  easily  see  that  p^(4/77)  may  be  represented  as  follows: 
p^ii/ri)  =  (1  -  a)  (1  -  b)  q^ii/ri)  +  b(l  -  a)  q^{^/ri)  +  a(l  -  b)  q^U/r])  +  abq^(4/r]) 

Note  that  all  the  coefficients  of  the  q^  ^  are  non-negative,  and  that  a  and  b  are  functions  of  a. 
Since  p(cv^,  of^)  ^  0  for  all  ^2  ^  (A-1)  and  (A-2)  imply  that  there  exist  variables 

and  ^ach  of  which  may  take  on  values  1,  2,  3,  and  4,  and  a  joint  probability  distribution 
such  that 

4  4 

/!,  =  1  V"  ‘  " 

|We  know  that  P(/^|'^2^  ^  probability  distribution  because  the  non-negativity  of  the  coefficients 

in  Eqs.  (A-1 )  and  (A-2 )  implies  p(^^  ,  ^2^  ^  ^  ^  ^-^i^ x. )  =  1 ,  i  =  1 ,  2  implies 

4  4  i  ■  ‘ 

D  2  piP .  ,  P^)  =  1  from  Rq.  (A-3).]  It  is  clear  that  Rq.  (A-3)  holds  even  if  p(n'  ,  a  )  is  a 

p  P  ^  ^  ^  ^ 

1  2 

joint  density  and  the  sums  in  Rq.  (A-1)  are  replaeed  by  integrals. 

Suppose  Eq.  (A-3)  holds  for  the  MC  ehannel  given.  Setting  x^  =  y^  =  0,  we  have 

4  4 

p{0y^/0x^]=  ^  ^  (0/0)q^ 

4  4 

=  pd,/?^)  (y/x^)  +  '  {A-4) 

Since  p{P . ,  P^)  ^0  and  p(0y  /Ox  )  =  0. 5q  (y  /x» ),  only  the  terms  of  Flq.  (A-4)  with  P  -  \  may  be 
nonzero.  [Otherwise  p(0l/00)  >  0  or  p(00/01)  >  0  or  both,  contrary  to  assumption.]  Hence, 

0.5  =  p(l,  1)  +  p(2,  1) 

From  Eq.  (A-3), 

p(ll/ll )  =  p(l,  1)  +  p(l.  3)  +  p(3,  1)  +  p(3,  3)  ^  p(l,  1)  0 

Since  p(ll/ll)  =  0,  p(l,  1)  =  0  as  well.  Thus,  p(2,  1)  =  0.5.  We  have 

p(00/l0)  =  p(2,  1)  +  p(2,  2)  +  p(4,  1)  +  p(4,  2)  p(2,  1)  =  0.5 

But,  p(00/l0)  was  given  to  be  zero;  henee,  we  have  a  contradiction  which  proves  that  the  channel 
whose  transition  probability  matrix  is  given  above  is  not  MS. 

Theorem  A. 2. 

There  exists  an  M  subchannel  MC  channel  with  X  -  Y  =  {o,  1}  which  is  not  MS. 

s  s 

Proof, 

Pick  a  single-subehannel  conditional  probability  distribution  pH/rf),  i  e  Y^,  rj  e  and  let 

M 

P(yi . ^M^^l . ^M^  "  P^^1^2/^1^2^  n  P(y/x.)  (A-5) 

i=3 
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where  P(y^y2/^^^2^  given  by  the  matrix  shown  in  the  proof  of  Theorem  A.l.  It  may  easily  be 
verified  that  the  RHS  of  Eq.  (A-5)  is  the  conditional  probability  distribution  of  an  MC  channel. 
Suppose  that  for  some  joint  distribution  p(q:^ . have 


p(y^ »  •  •  »  yTyj/^^ ’  •  •  •  > 


V 


a  .  eA 
1 


^  . "m> 


X  Pa  ••• 

1 


(A -6) 


Then, 


E 


V 

LJ 

y_  _€Y 
‘^M  s 


p(yi. 


. 


V 

L 


E 


a.  cA  q?_€A 
1  ^ 


p(yiy2AiX2) 


(A -7) 


But  this  implies  that  the  ehannel  whose  transition  probability  matrix  is  given  in  the  proof  of 
Theorem  A.l  is  MS,  contrary  to  Theorem  A.l. 

Theorem  A.3. 


For  any  integer  M,  subchannel  input  space  X^,  and  subchannel  output  space  Y^,  there  exist 
MC  channels  which  are  not  MS. 

Proof. 

Let  X’^  =  p’^y’^'  '  •'  ’  '  *  '  '^M^  given  by  the  RHS  of  Eq.  (A-5)  with 

x^  xl,  and  y^  y'.  .  Define  a  function  f(x)  from  X^  onto  X^,  and  two  probability  distributions 

(or  densities)  p  (|  )  and  p .  (|  )  over  Y  such  that  p^(|  )  pAi)  =  0  for  all  |  e  Y  .  Define^ 

O  1  s  O  1  s 


P^^l . . ^ 


M 


n  Py-(yi> 


i=l 


X  p'[y\ . . f(x^^)| 


(A-8) 


The  relationship  between  the  primed  (original)  and  unprimed  (derived)  channels  is  shown  in 
Fig.  A-1.  Since  the  primed  ehannel  is  MC,  clearly  the  unprimed  ehannel  is  MC,  too.  Xow,  sup¬ 
pose  for  some  (Pq, A )} q, ^  ^  Yg*  V  ^  and  some  joint  distribution  p(q;^,  ....  have 


P^^l . . 


V 


a  .  €A 
1 


L  Pf^H  ’  •  •  '  »  ^ Vf^ 


X  PaE^i/xi)  ■  Pa  (yM/x^,j) 

1  M 


(A-9) 


tAs  usual,  p  (y.)  =  p(y./ y'.). 
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|3-63-T3IO| 


Let 


and 


I _ _ J 

Fig.  A-l.  Relotionship  between  original  and  derived  channels. 


Y  be  the  set  of  4  ^  Y  for  which  p  )  •  0 
so  s  ^o 


Y  .  be  the  set  of  4  e  Y  for  which  p  (^  )  0 

si  s 


Since 


-  P  (^  )  P.(s  )  =  0  for  all  4  ^  Y  ,  Y  and  Y  ,  are  disioint.  If  r.  t  define' 
1  s  so  si  •  is 


P^  (r./x.)=  V  (y./xp 

^  V.eY  ^ 

‘  1  sr. 

1 


(A-10) 


Note  that 


S  Py,(yi)  ^  <5^/ 

y.CY  ^  ^ 

1  r 


(A-11) 


Pick 


and 


Choose 


so  that 


. ‘'m>  ■  . 


. . 


. ^1*  ■  . 


f(x,)  =  u. 


i  =  1 ,  .  .  .  ,  M 


(A-12) 
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From  F^qs.  (A-8)  and  (A-9), 


V 

U 

v’  eY' 

1  s 


E 


M 

n  Py'/^i) 
i=l  ^ 


p'ly'i . . 


•  S  ■■  E  pi«, . “m' p. 

Summing  both  sides  of  Fq.  (A- 13)  over  all  y.  e  ,  i  ^  i  ^  M,  and  using  Eqs,  (A- 10),  (A-11), 
and  (A-12),  we  get  ^ 


V  ...  V 

_J  L—i 

v'.  ^Y’  y;^^^Y'  Li=l 

s  s 


M 


n  ^y; 


p'  'y'l . . v* 


E  •••  E  P(“i . “m>  •••  •  (A-14) 


«ie/V 


M 


This  reduces  further  to 


p'(r^,  .  . 


M' 


. “m*  E 


V 


Zj  »  •  •  •  * 


O!  .  eA 
1 


X  w  (rVu,)  •  •  •  w  (r^^/u^,) 
a.  r  r  a,.'  M' 

1  M 


where  w^  (r./u.)  is  defined  by 
i 


(A-15) 


w  r./f(x.  ]  =  P  (r./x. 
OL }  1  ^  oi .  r  1 

1  1 


(A-16) 


for  all  a.  e  A,  r.  e  Y'  ,  and  x.  c  X  .  Clearly,  w  is  a  conditional  distribution  on  Y'  x  X’  . 

iisis  tt  ss 

Since  (r^.  ....  r^^)  and  (u^,  .  .  .  .  were  arbitrary,  Eq.  (A-15)  implies  that  the  primed  channel 
is  MS,  contrary  to  Theorem  A. 2. 


Theorem  A. 4. 

There  exists  an  Nil  channel  with  X  =  Y  =  {O,  l},  which  is  not  MST. 

s  s 

Proof. 

For  i  an  even  integer,  and  N  an  even  positive  integer,  define 

N/2 

P^^i+l . yi+N^^i+r  ■  ■  ■  ’  ^i+N*  n  P^y^+Ek-r  ^i+2k/^i+Ek-l’ ^i+2k'  * 

k=l 

where  the  bivariate  conditional  probability  is  given  by  the  matrix  shown  in  the  proof  of 
Theorem  A.l.  The  conditional  probability  for  other  values  of  i  and  N  ean  be  obtained  by 
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summing  conditional  probabilities  of  the  form  given  over  appropriate  outputs.  Since  each  odd 
input-output  pair  and  the  succeeding  even  one  "stand  alone,"  it  is  clear  (an  argument  similar  to 
that  used  to  prove  Theorem  A. 2  may  be  used)  that  the  channel  given  by  ICq.  (A-17)  is  Nil  but  not 
MST. 

The  restriction  to  binary  subchannel  alphabets  may  be  removed  as  in  Theorem  A.  3. 

We  note  that  the  channel  given  by  Eq.  (A-17)  is  cyclostationary  rather  than  stationary.  In 
fact,  there  is  a  unique  stationary  channel  with  bivariate  conditional  probabilities  given  by  the 
entries  in  the  following  matrix: 


x.x 

1 


i+1 


00 

01 

10 

11 

00 

0.5 

0 

0 

0 

01 

0 

0.5 

0.5 

0.5 

10 

0 

0.5 

0.5 

0.5 

11 

0.5 

0 

0 

0 

) 


This  channel  has  the  property  that  given  a  single  input  sequence,  there  are  only  two  possible 
output  sequences,  each  having  probability  one-half.  This  is  true  regardless  of  the  length  of 
the  input  sequence.  Unfortunately,  this  channel  is  not  Nil,  and  this  fact  is  proven  as  follows: 

(1)  The  input  (x^,  x^,  x^)  =  (0,  0,  1)  may  result  in  the  outputs  (y^,  y^,  y^)  = 

(0,  0,  1)  or  (1,  1,  0),  each  with  probability  onc-half. 

(2)  The  input  (x^,  x^)  =  (0,  1,  1)  may  result  in  the  outputs  (y^,  y^,  y^)  = 

(0,  1,  0)  or  (1,  0,  1),  each  with  probability  onc-half. 

Hence, 

p{00l/001)  +  p{01l/001)  I  +  0  =  -^ 

and 

p(00l/011)  +  p(0ll/011)  =0  +  0=0 

Thus,  the  channel  referred  to  is  not  Nil.  It  is  not  known  whether  there  exist  strictly  stationary 
Nil  channels  which  are  not  MST. 
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APPENDIX  B 

PROOFS  OF  THEOREMS  2.3  AND  2.4 


M 

Let  P  be  the  set  of  probability  distributions  over  a  finite  product  space  ,  and  D  be  the 
set  of  product  distributions  over  Let  consist  of  K  points.  Clearly, 


DC  p  e 


(B-l) 


and  P  is  compact.^ 

Lemma. 

D  is  compact  in  :R 

Proof. 


K 


D  is  bounded  because  D  C  P,  and  P  is  bounded.  Let  p(x.,  .  .  .  ,  x,,)  be  a  limit  point  of  D. 

M  1  oo  1  M 


Then,  there  exists  a  sequence 


n  q,  {x.)>  of  product  distributions  with 
i-- 1  1  M  L  1 


lim 


y 


X.  tX 
1  s 


M 


p(x^,  n 


i~  1 


=:  0 


{B-2) 


Since  the  sums  are  finite,  this  implies 


lim  V 


E 


X,  cX 
1  s 


M 


p(^i . “  n 


i=  1 


=  0 


{B-3) 


Now, 


V 


x.cX 
1  s 


V 


V 


x.eX 
1  s 


M 


. ""m'-  n 


i=l 


p.(x.)  -q^  (x.) 
i 


>0 


th 


i  =  1 ,  .  .  .  ,  M 


{B-4) 


where  p.(x^)  is  the  marginal  distribution  of  the  i  subchannel  input  associated  with  the  joint 
distribution  p(x^,  .  .  .  ,  From  Eqs.  {B-3)  and  {B-4), 

lim  q^  {x^  =  P^{x.)  .  (B-5) 


*oo  1 

l^Yom  Eqs.  (B-2)  and  (B-5), 


M 


M 


p(Xj,  .  .  .  ,  x^j)  =  lim  []  (Xj)  H  Pi<^i' 


(B-6) 


i=  1 


i=  1 


t  Thot  is,  closed  ond  bounded.  See  W,  Rudin,  Principles  of  Mathematical  Analysis,  2nd  edition  (McGraw-Hill, 
New  York,  1964),  Theorem  2.4L 
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Hence,  p(x^ . is  a  product  distribution,  and  D  is  closed.  Since  D  is  closed  and  bounded, 

it  is  compact. 


Proof  of  Theorem  2.3. 

K 

We  note  that  since  D  and  P  arc  compact  subspaces  of  ,  D  X  {[0,  1]}  and  P  X  {|0,  1)}  are 
1 

compact  subspaces  oPJ\ 

(1)  Since  a  continuous  function  defined  on  a  compact  set  is  bounded,^ 

Fd(R)>  Gj^(R).  Fp(R),  and  Gp(R)  are  finite. 

(2)  Since  a  continuous  function  defined  on  a  compact  set  achieves  its  max¬ 
imum, ^  for  each  value  of  R  there  exist  0  <  ^  1  and  p'*'  e  D  with 

.  (B-7) 


Thus, 

Fp(R)  =  R)  ^  g(p^:%  p^\  R)  ^  Gp(R)  ^  Gp(R)  (B-8) 

as  required.  The  last  inequality  is  an  obvious  consequence  of  Eq.CB-l). 

Clearly,  if  f  <  g,  the  inequality  between  Fp  and  Gp  is  strict. 

(3)  For  the  same  reason  as  in  (2)  above,  for  each  value  of  R  there  exist 
0  ^  p'  1  and  p'  €  P  with 

Fp(R)  =  f(p',  ph  R)  .  (B-9) 


Hence, 

Fp(R)  ^  Fp(H)  =  f(p',  p',  R)  ^  g(p',  p\  R)  <  Gp(R)  .  (B-10) 

The  first  inequality  is  an  obvious  consequence  of  Eq.  (B-1).  It  is  clear 
that  if  f  <  g,  the  inequality  between  i’p  and  Gp  is  strict. 


Proof  of  Theorem  2.4. 

(1)  For  each  R,  pick  €  >  0.  There  exist  0  p'  1  and  p'  €  D  with 

0  <  Fp(R)  -  f(p',  ph  R)  ^  €  .  (B-11) 

Thus, 

Gp(R)  -  Fp(R)  =  [Gp(R)  -  g(p’,  p',  R))  +  [g(p’.  p’,  R)  -  f(p'.  p',  R)1 


+  [f(p’,  p’,  R)  -  Fp(R)]  0  +  0  -  €  =  -  t 


fW.  Rudin,  op.  cU.,  Theorem  4.15, 


t  Ibid. ,  Theorem  4.16.  If  this  were  not  so,  we  would  hove  used  l.u.b.  insteod  of  max  In  the  definitions  of 
Gp,  Fp,  ond  Gp.  ° 
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Since  e  was  arbitrary,  we  have 


Gd^R)  -  Rj^(R)  >0 

as  required.  The  rest  of  Eq.  (2-22)  follows  from  Eq.(B-l). 

(2)  The  proof  here  proceeds  as  in  (1)  with  P  replacing  D. 

We  note  that  if  f  and/or  g  fail  to  depend  on  any  or  all  of  p,  p,  and  R,  the  conclusions  of 
Theorems  2.3  and  2.4  remain  valid. 
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APPENDIX  C 

SOME  USEFUL  INEQUALITIES 


This  appendix  contains  statements  of  the  important  nontrivial  inequalities  used  in  the  text. 

A  reference  is  given  for  each  inequality  stated.  In  any  inequality  in  which  A  appears,  we  assume 
0  <  A  <  1. 

(1)  Let  t)e  a  sequence  of  non-negative  numbers.  Thent 


V  a.  U  a" 


(C-1) 


,i=l 


i=l 


f  N  r  1 N 

(2)  Let  ^^i-'i-l  sequences  of  non-negative  numbers  with 


X  b.  =  1.  Then 
i=1  ^ 


t 


N  /  N  \A 

y  b.a.^<  y  b.a. 

^  11^1^  111 

i-1  \i=l  / 


(C-2) 


(3)  Minkowski's  inequality:  let  (a..}._^  and  {p  }._^  be  sets  of  non-negative 

M  ^ 

numbers  with  2  p.  =  1.  Then, 


N  /  M 

y  f  y 

i-1  \j=l 


J 


,iA 


p.a.^ 
J  iJ 


M 


N 


Lj-l  \i=l 


lA 


(C-3) 


fG.H.  Hardy,  J.  E.  Littlewood,  and  G.  Polya,  Inequal ities  (Cambridge  University  Press,  Cambridge,  England, 
1959).  See  Theorem  19,  p.  28. 

t  Ibid.,  Theorem  16,  p.  22. 

§  Ibid.,  Theorem  24,  p.  30. 
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APPENDIX  D 

CANONICAL  REPRESENTATIONS 


Theorem  D.  1. 

For  any  MS  channel  with  finite  input  and  output  alphabets,  there  always  exists  a  representation 
for  whieh  in  eaeh  channel  state  the  subehannel  conditional  probabilities  are  all  either  zero  or  unity. 

Proof. 

Let  A ,  X  =  {  1 ,  .  .  .  ,  L)  ,  Y  =  {  1 ,  .  .  .  ,  Q)  .  p  ( q/i )  cv  c  A ,  q  c  Y  ,  i  e  X  ,  p( a  . ,  .  .  .  ,  o  ,  J  be 
s  s  s  St  ^ \i 

given  and 

(^<^1 . . 


V 


O'  .  cA 
1 


V 

L-i 


P<«1 . ••• 

1  ivi 


(D-l) 


We  define  a  pure  (sub)  channel  as  one  for  which  the  input  completely  determines  the  output.  There 
are  possible  different  pure  subehannels.  We  denote  the  eonditional  probability  distribution 
associated  with  a  pure  subchannel  by  s^(q/f ),  i  e  X^,  q  e  Y^,  where  1  <  Q  We  note  that  for 

eaeh  (3,  f  there  is  a  unique  value  of  q  with  s^(q/f)  =  1. 

If  \  4  for  1  <  k  M,  then 


M 

p(y/x)  =  n 
k=l 


(D-2) 


is  the  conditional  probability  distribution  of  a  pure  channel  with  inputs  and  outputs. 

P'or  eaeh  a  e  A,  we  shall  show  that  it  is  possible  to  expand  p^(q/f )  in  a  series  of  s^(q/f) 
with  non-negative  coeffieients,  as  shown  in  Eq.  (D-3). 

n=l 


where  >0,  a  e  A,  and  1  <  n  Q  .  The  expansion  Eq.  (D-3)  is  not  generally  unique,  but  one 

Qf 

way  of  finding  the  is  to  proeeed  as  follows:  Let 


.°(q/i)  =  p^(q/i) 


(D-4) 


By  definition  of  a  probability  distribution,  p°(q/f )  ^0,  1  f  ^  L,  1  <  q  <  Q.  Bind  the  smallest 

^  L  O  1 

nonzero  transition  probability  in  the  set  q  =  l'  Call  it  r^  .  Now, 


Q 


(D-5) 


q  =  l 


1  1 

for  all  f,  1  <  f  ^  L.  Thus,  there  must  be  a  (not  neeessarily  unique)  function  q  (i),  1  <  q  (f)<  Q. 


with  p^  [q^(f  )/f]  >0,  1  <  f  <  L.  By  definition  of  r^, 

p°(q^^)A] 


(D-6) 
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A 

for  all  i,  1  <  i  L.  Choose  q  (^)  so  that  Eq.  (D-6)  is  satisfied  with  equality  for  at  least  one 

s^  be  the  pure 
1 

S|3  [q^O/f]  =  1  .  (D-7) 


value  of  i.  Let  s^  be  the  pure  subchannel  with 
1 


I.et 


Clearly, 


p^(q/f)  =  p°(qA)  - 


(I)-8) 


p^(q/f)^0  ,  l<f<L  ,  l^q^Q 

If  p^{q/i)  is  not  identically  zero,  the  process  may  be  continued.  Note  that  at  each  step  of  the 
process,  a  positive  value  of  p^(q/^)  is  converted  to  a  zero  value  of  p^^^^(q/n.  Since  there  are, 
at  most,  LQ  positive  values  of  p^iq/-^),  the  process  must  terminate  after,  at  most,  LQ  steps, 
and  we  may  write 


N 

.  O'  b 
m  =  l  m 


(I)-9) 


where 


N  ^  LQ 


(D-10) 


Equation  (D-9)  may  be  converted  to  Eq.  (D-3)  if  r^  C^  and  the  s^(q/i)  are  renumbered  so 
that  b^  m.  Now,  from  Eq.  (D-3), 


M  M 

n  Pa- =  n 


i=l 


i  =  l 


Q 

^  O'  . 

^  C  ^s  iy./x.) 
^  n.  n.  1 

>1  1  1 

n.  =  l 
1 


Thus,  from  Eqs.(D-l)  and  (D-11), 

p(yi.  •••.YmAi . 


A1 

■Q^ 

=  S’  . 

i—J 

P(Q'|,  . . 

•  .«M^  n 

S' 

_ _ 1 

O'  jfA 

i=l 

.n.  =  1 

i 

II 

V 
_ _ ( 

\ 

1  pla^,. 

ni  =  l 

'\\r^ 

cv  ,  €  A 

1 

a  . 

C  ^s  iy./x.) 
n.  n.  *^1  i 
1  1 


O'  ^ 

n. 


xsn  (YiAi)  ...  s  (Ym/Ai^ 
1  IVl 


Ai 


(D-ll) 


Q 

\' 


ni=l 


Q 

V 


(1)-12) 


100 


where 


d(nj. 


•  •  -  E 

O' 


V  “l 


C 


"m 


(D-13) 


d(n^,  .  .  .  ,  n^)  is  clearly  a  probability  distribution  because  it  is  non-negative  valued  by  Eq.  (D-13), 
and  a  summation  of  both  sides  of  Eq.  (D-12)  over  y^,  .  .  . ,  y^^  shows  that  it  is  properly  normalized. 
Thus,  we  have  our  desired  representation. 
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APPENDIX  E 

A  COMPLETELY  CONSTRAINED  CHANNEL 
WITH  A  CONTINUOUS  PARAMETER 


In  this  appendix,  we  shall  give  an  example  of  an  MSCC  ehannel  whose  eapaeity  per  subehannel 
with  state  unknown  approaches  the  capacity  of  a  single  subchannel  with  state  known  as  the  number 
of  subchannels  increases,  although  the  state  representation  for  the  channel  is  not  discrete.  Sup¬ 
pose  we  have  an  MS  channel  consisting  of  A1  parallel  binary  symmetric  subchannels.  For  each 
use  of  the  channel,  the  crossover  probability  g  is  the  same  for  all  the  subchannels.  Thus,  this 
channel  is  MSCC.  A  probability  density  p{g)  is  given  with  p{g)  =  0,  unless  0  4^  g  ^  1.  We  will  sup¬ 
pose  that  each  possible  input  is  used  with  probability  (1/2)^,  independently  of  the  channel  state. 

It  is  easy  to  see  that  this  is  the  input  distribution  which  achieves  capacity,  whether  or  not  the  state 
is  known  at  th('  rc'ceiver.  Now, 

KXY;  G)  =  I(Y;  (’,)  +  I(X;  G/Y)  .  (K-l) 


By  symmetry,  p{y/g)  =  (l/2)^  =  p(y).  Thus,  y  and  g  arc  independent,  and  KY;  G)  =  0.  Hence, 
KXY;  G)  =  I(X;  G/Y)  .  (E-2) 

Combining  Eqs.  (3-4)  and  {E-2),  we  have 

I(X;  Y)  =  I(X;  Y/G)  -I(XY;  G)  ,  (E-3) 

For  each  value  of  g,  p(y/xg)  depends  only  on  the  Hamming  distance  djj(x,  y)  between  x  and  y. 
Let  D  be  the  ensemble  of  such  distances. 


KXYD;  G)  =  KXY;  G)  +  I(D;  G/XY) 
=  I(D;  G)  +  KXY;  G/D) 


Now, 

p(d/gxy)  =  p(d/xy) 

and 

p(xy/gd)  =  p(xy/d) 

Hence, 

KD;  G/XY)  =  0  =  KXY;  G/D) 
and,  combining  Eqs.(E-6)  and  (E-4),  we  get 
KXY;  G)  =  KD;  G) 

Gombining  Eqs.  (E-7)  and  (E-3),  we  get 

KX;  Y)  =  KX;  Y/G)  -  K  D;  G) 

Now,  D  =  (O,  1,  ,  ,  .  ,  M);  hence, 

KD;  G)  ^  H(D)  <  log(M  +  1) 

Thus,  using  Theorem  3,1,  Eqs.  (E-8)  and  (E-9),  we  have 

KX;  Y/G)  -  log(M  +  1)  ^  KX;  Y)  «  KX;  Y/G) 


(E-4) 


(E-5) 


(E-6) 


(E-7) 


(E-8) 


(E-9) 


(E-10) 
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We  may  readily  compute 


KX;  Y/G)  =  M  r  [1  +  g  logg  +  (1  -  g)  logd  -  g)']  p(g) 
=  M  I(X^;  Y^G) 

Thus,  from  Eqs.(E-lO)  and  (E-ll), 

I(X^;  Y^/G)  -  I(X^;  Y/G)  . 


dg 


Now, 


log(M  +  1)  , 

-T -  =  ° 


Hence, 


lim 

M-^oo 


KX;  Y)  


M 


=  KX^;  Y/G)  . 


E rom  the  remarks  preceding  Eq.  (E-l),  Eq.{E-14)  implies 


lim 
M— oo 


C 


sM 


C’ 


(E-ll) 


{E-12) 


{E-13) 


{E-14) 


{E-15) 
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APPENDIX  F 

AN  MSCC  CHANNEL  FOR  WHICH  EJ^(Rg)  E^lRg) 


Consider  the  2SCC  channel  defined  [see  Eq.  (4-1)]  by  X 


a  =  1,2,  and 


0 


n  =  I 

Otherwise 


0  if  ^  =  2,  ....  7 

Y  if  ^  -  1,  H  and  rj  =  2.  ....  7 


1  if  (^  r])  -  (1.  1)  or  (8,  8) 
0  if  (^  T])  -  (1.  8)  or  (8.  1) 


Y  -  { 1 , 
s 


B),  p(fv)  =  1/2, 


The  subchannel  states  are  depicted  as  follows: 


iOr}} 


In  the  computation  of  ^.re  first  concerned  with  the  minimization  for  each  value 

of  p,  0  ^  p  1,  of  the  function  E^{p,  p)  over  all  p  e  P  [see  Eq.  (4-33)].  Using  Eq.  (4-19)  and 
taking  advantage  of  the  available  symmetries,  we  find  that  for  purposes  of  minimization  we  may 
consider  as  a  function  of  a  reduced  probability  vector  p^; 


F^(p,p^)  =  j  (4p^(l,  +  24p^(l,  2)^+'"  +  36p^(2,  2)^+'^] 

+  2  (p^(l,  1)  +  12p^(l,  2)  +  36p^(2,  2) 

where 

Pj.  =  [PpH,  1),  P^d,  2),  p^(2,  2)] 

p^(l,  1)  ^0  •  ,  p^(l,2)>0  ,  p^(2,  2)^0 

and 

4p^(l,  1)  +  24p^(l,  2)  +  36p^(2,  2)  =  1 
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The  components  p(77^,  t?^),  V  V 2  ~  ■  •  •  >  original  probability  vector  p  may  be 


obtained  from  those  of  the  reduced  probability  vector  p^  as  follows. 


Define 


Sj  =  (1,8} 


S2  =  (2 . 7} 


Ajj  =  €  S^,  ^2  e  S^} 

A^2“  I  V 2"^ ^^Z  ^1  ^^Z’  ^Z 

A^z  =  {<'?i,n2)Ai  «  Sz,  r,z  e  Sz)  . 

Then,  iri^.ri^)  ^  implies 


p(r!l.  1)2'  =  Pp^^’  j' 


i,  J  =  1,  2 


In  the  computation  of  E2(Rg),  we  are  first  concerned  with  the  minimization  for  each  value 
of  p,  0  <  p  1,  of  the  function  E^{p,  p)  over  all  p  c  D  [see  Eq.  (4-79)].  Using  Eq.  (4-47)  [replacing 
the  IwHS  by  G^(p,  q),  and  p^  by  q^  to  avoid  confusion]  and  taking  advantage  of  the  available  sym¬ 
metries,  we  find  that  for  purposes  of  minimization  we  may  consider  as  a  function  of  a  reduced 


probability  vector  q^: 


MP.q.)  -  T  +  6q_(Z)l+Pl^  +  Z  (q_(l)  +  6q_(Z)  ( |)  l/l+P  ,2(  1+P) 


where 


q  =  (q  <1),  q  (2)] 
^sr  ^^sr  ^sr  ^ 


q3,(l)>0 


q^^(2)  ^0 


and 

2q  (1)  +  6q  (2)  -  1  . 

sr  sr 

The  components  Qg(’7)»  ^  =  1,  •  •  •  j  8,  of  the  original  subchannel  probability  vector  q^  may  be 
obtained  from  those  of  the  reduced  probability  vector  q^^  as  follows: 


77  €  S.  implies  ^  ^sr^^^ 


i  =  1,  2 


Thus,  we  may  compute  (for  in  bits) 


and 


E' (R  )  =  max 

0<p^l 


E  ^(R  )  -  max 
O^p^l 


-2pR  In  2  -  In  min  FMp,  p^) 

S  ^  I 


—  2pRg  In  2  —  In  min  G^{p,  q^) 
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Rj  (bits) 

Fig.  F-1.  E' (R  )  and  E^(R  )  vs  R  for  a  particular  MSCC  channel. 

2  s  2  s  s 

The  minimizations  over  and  were  performed  using  the  method  of  Zoutendijk.t  The 
entire  computation  was  programmed  on  the  IBM  360.  Results  are  given  in  FTg.  B-1,  where  the 
vertical  distance  A  between  the  straight-line  portions  of  the  two  curves  is  0.0318.  Tlie  guar¬ 
anteed  accuracy  of  A  is  given  by  the  bound  0.0301  <  A  <  0.0319.  This  bound  is  believed  to  be 
conservative  (i.e.,  A  is  believed  to  be  given  by  0.0318  to  three  significant  figures).  The  results 
clearly  demonstrate  that  E^(R^)  ^  channel. 

Table  F-1  gives  some  of  the  values  of  E^iRg)  and  E^iRg).  together  with  the  values  of 
f),  p^,  and  which  achieve  the  maxima  required  by  the  definitions  of 

The  marginal  distribution  of  subchannel  inputs  pi?])  (the  same  for  both  subchannels)  cor¬ 
responding  to  p^  may  be  computed  as  follows: 

P^(^)  -  6p^(l,2)  +  2p^(l.  1) 

p^(2)  =  6p^,(2.  2)  +  2p^J1,  2) 

7]  f  S.  implies  p{r})  p^(i)  ,  i  =  1,  2 

The  capacity  for  this  channel  is  1.660964  bits,  achieved  by  a  product  distribution  (see 
proof  of  Theorem  2.7)  with  q^  =  (0.19992979,  0.10002340). 

tG.  Zoutendlfk,  Methods  of  Feosible  Directions  (Elsevier,  Amsterdam,  1960). 
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